Open kechan opened 1 month ago
I have a simillar warning /opt/homebrew/lib/python3.12/site-packages/quanto/library/ops.py:66: UserWarning: An exception was raised while calling the optimized kernel for quanto::dqmm: Unsupported TypeMeta in ATen: Falling back to default implementation.
Since upgrading pytorch this morning. Increased my scripts runtime from 8s to 2min.
@HackHerz I was told elsewhere the execution correctness may still be ok due to fallback, but it may cause unacceptable latency hit.
Should be fixed in 0.2.2
UserWarning: An exception was raised while calling the optimized kernel for quanto::dqmm: /home/jupyter/.cache/torch_extensions/py38_cu121/quanto_cpp/quanto_cpp.so: cannot open shared object file: No such file or directory Falling back to default implementation.
I got this while trying this on a google gcp vm with:
torch 2.3 quanto 0.2.0 accelerate 0.30.1
I am not sure if this is the reason why my qbit8 quantized model run slower than the original model.