Open ynimmaga opened 2 months ago
We don't recommend using the torchao-nightly
package anymore and will move two --pre torchao
there's still a few issues though
In the meantime mind just building from source using USE_CPP=0 pip install .
and letting me know if that works?
@msaroufim, thank you for the suggestion. I tried building from source with the above command and verified that the new version is installed from pip list
. However, I still see the same error.
I also commented out the the try except block with the error message in autoquant.py and got a better traceback of the error:
activation_shapes: torch.Size([32, 32]), times_seen: 1
weight_shape: torch.Size([64, 32]), dtype: torch.float32, bias_shape: torch.Size([64])
Traceback (most recent call last):
File "/home/ao/scripts/simple_example.py", line 13, in <module>
model(input)
File "/home/new_ao_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/new_ao_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1767, in _call_impl
args_kwargs_result = hook(self, args, kwargs) # type: ignore[misc]
File "/home/new_ao_env/lib/python3.10/site-packages/torchao/quantization/autoquant.py", line 608, in autoquant_prehook
module.finalize_autoquant()
File "/home/new_ao_env/lib/python3.10/site-packages/torchao/quantization/autoquant.py", line 620, in finalize_autoquant
_change_autoquantizable_to_quantized(
File "/home/new_ao_env/lib/python3.10/site-packages/torchao/quantization/autoquant.py", line 494, in _change_autoquantizable_to_quantized
_replace_with_custom_fn_if_matches_filter(
File "/home/new_ao_env/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 187, in _replace_with_custom_fn_if_matches_filter
new_child = _replace_with_custom_fn_if_matches_filter(
File "/home/new_ao_env/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 183, in _replace_with_custom_fn_if_matches_filter
model = replacement_fn(model)
File "/home/new_ao_env/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 232, in insert_subclass
getattr(cls, from_float)(lin.weight, **kwargs), requires_grad=False
File "/home/new_ao_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/new_ao_env/lib/python3.10/site-packages/torchao/quantization/autoquant.py", line 146, in to_quantized
self.tune_autoquant(q_cls, shapes_and_dtype, time_for_best_shape)
File "/home/new_ao_env/lib/python3.10/site-packages/torchao/quantization/autoquant.py", line 97, in tune_autoquant
res = q_cls._autoquant_test(act_mat, self.weight, bias, best_time, self.mode)
File "/home/new_ao_env/lib/python3.10/site-packages/torchao/quantization/autoquant.py", line 267, in _autoquant_test
res = do_autoquant_bench(q_c_op, act_mat, w_qtensor, bias, warmup=25, rep=100)
File "/home/new_ao_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/new_ao_env/lib/python3.10/site-packages/torchao/quantization/autoquant.py", line 214, in do_autoquant_bench
torch.cuda.synchronize()
File "/home/new_ao_env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 948, in synchronize
_lazy_init()
File "/home/new_ao_env/lib/python3.10/site-packages/torch/cuda/__init__.py", line 306, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
It seems there are several CUDA calls in autoquant.py. To provide more context, I want to try autoquant with torch.compile OpenVINO backend, which is a non-CUDA backend. Please let me know if there are any more suggestions.
Ah interesting indeed the script is CUDA only right now but perhaps we should fix that - is this something you're interested in contributing? Otherwise we were planning on taking at expanding our hardware support matrix after PyTorch conference - cc @supriyar
Hi @ynimmaga, nice to see you here - I believe we met last year at the PTC poster sessions and after to discuss how to use PyTorch quantization with OpenViNO. A lot has happened since and its great to see more native integration of the openvino backend with torch.compile.
Like @msaroufim mentioned, the autoquant tool is currently built for GPU backends and has a lot of assumptions in the code assuming cuda is available. While we are planning to expand HW support to add some CPU kernels to torchao, we don't have plans for now to extend autoquant specifically for CPU. Are you specifically interested in supporting the OpenVINO backend via autoquant or general support for CPU backend?
Thank you @supriyar. Nice to see you too :). We took your suggestion from last year and already supported PT2 quantization flow natively with OpenVINO (link). This flow works well, but saw that autoquant API provides even better user experience (recommended by @agunapal). We are interested in supporting OpenVINO backend specifically.
@ynimmaga what kind of use cases do you have in mind? We won't have the bandwidth to support OpenVINO specifically but if that's something your team would like to contribute to or work on let me know! cc @HDCharles
Thanks @supriyar. If there is general CPU support, we can easily support other torch.compile backends like OpenVINO. I will check internally and let you know if we can find bandwidth to work on this.
Autoquant fails when CPU packages are used. Tried with the latest nightly packages by installing torchao and torch using the below:
pip install --pre torchao-nightly torch --index-url https://download.pytorch.org/whl/nightly/cpu
I modified the simple example from the documentation for CPU as below:
Above script gives the below output:
Hope I am not missing any steps here. Does autoquant support CPU? Hope someone can give me some advice. Thank you.