Open ThisisBillhe opened 3 weeks ago
cc @HDCharles @jerryzh168
I haven't seen the require gradients error before, can you give us a repro?
for huggingface quantization, you can take a look at https://huggingface.co/docs/transformers/main/en/quantization/torchao
I haven't seen the require gradients error before, can you give us a repro?
for huggingface quantization, you can take a look at https://huggingface.co/docs/transformers/main/en/quantization/torchao
Hi, I use the same benchmark script in your repo. You can see that if torch.version < 2.5, unwrap_tensor_subclass function will be called, which leads to the error.
As for the problem of process stuck, it may be related to the static cache. The program got stuck with static cache on my A100 machine but the same program works on my 3090 machine.
Hi! I try to reproduce the benchmark results using torchao/_models/llama/generate.py. However, I can not benchmark the quantized model successfully. Specifically, when using a torch version < 2.5.0, I got the following error:
When upgrading the torch version to 2.5.0, the process got stucked and not responding for a very long time:
I do not see any CPU usage with top command, and I have to kill the process by its id.
Also, it there any way to accelerate a huggingface model by quantizing it with torchao, without converting the model format?