Closed jerryzh168 closed 4 months ago
for reference the error is
2024-05-31T23:12:43.7830534Z =========================== short test summary info ============================
2024-05-31T23:12:43.7832053Z FAILED test/integration/test_integration.py::TestSubclass::test_int8_dynamic_quant_subclass_api_1_cpu - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
2024-05-31T23:12:43.7833246Z CppCompileError: C++ compile error
2024-05-31T23:12:43.7833517Z
2024-05-31T23:12:43.7833621Z Command:
2024-05-31T23:12:43.7840753Z g++ /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas -D_GLIBCXX_USE_CXX11_ABI=0 -I/opt/conda/envs/venv/lib/python3.9/site-packages/torch/include -I/opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/TH -I/opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/THC -I/opt/conda/envs/venv/include/python3.9 -L/opt/conda/envs/venv/lib/python3.9/site-packages/torch/lib -L/opt/conda/envs/venv/lib -L/opt/conda/envs/venv/lib/python3.9/site-packages/torch/lib -ltorch -ltorch_cpu -lgomp -ltorch_python -lc10 -mavx512f -mavx512dq -mavx512vl -mavx512bw -mfma -DCPU_CAPABILITY_AVX512 -O3 -DNDEBUG -ffast-math -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -march=native -fopenmp -D C10_USING_CUSTOM_GENERATED_MACROS -o /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.so
2024-05-31T23:12:43.7847453Z
2024-05-31T23:12:43.7847561Z Output:
2024-05-31T23:12:43.7848972Z /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp: In function ‘void kernel(half*, const half*, const int8_t*, const int64_t*, const half*, half*, half*, half*, long int)’:
2024-05-31T23:12:43.7851691Z /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:77:36: error: no match for ‘operator*’ (operand types are ‘at::vec::CPU_CAPABILITY::Vectorized<int>’ and ‘at::vec::CPU_CAPABILITY::Vectorized<float>’)
2024-05-31T23:12:43.7853239Z 77 | auto tmp40 = tmp37 * tmp39;
2024-05-31T23:12:43.7853703Z | ~~~~~ ^ ~~~~~
2024-05-31T23:12:43.7854122Z | | |
2024-05-31T23:12:43.7854580Z | | Vectorized<float>
2024-05-31T23:12:43.7855061Z | Vectorized<int>
2024-05-31T23:12:43.7856049Z In file included from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:8,
2024-05-31T23:12:43.7857334Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec.h:4,
2024-05-31T23:12:43.7858557Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional_base.h:6,
2024-05-31T23:12:43.7859815Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional.h:3,
2024-05-31T23:12:43.7861053Z from /tmp/torchinductor_root/sk/cskh5dx62fglpphcrl6723dnmowdabouerrzy3dmqcngbxwfa7bv.h:35,
2024-05-31T23:12:43.7862212Z from /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:2:
2024-05-31T23:12:43.7865215Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_base.h:629:41: note: candidate: ‘template<class T> at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::operator*(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&)’
2024-05-31T23:12:43.7867719Z 629 | template <class T> Vectorized<T> inline operator*(const Vectorized<T> &a, const Vectorized<T> &b) {
2024-05-31T23:12:43.7868487Z | ^~~~~~~~
2024-05-31T23:12:43.7869739Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_base.h:629:41: note: template argument deduction/substitution failed:
2024-05-31T23:12:43.7871736Z /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:77:38: note: deduced conflicting types for parameter ‘T’ (‘int’ and ‘float’)
2024-05-31T23:12:43.7872918Z 77 | auto tmp40 = tmp37 * tmp39;
2024-05-31T23:12:43.7873378Z | ^~~~~
2024-05-31T23:12:43.7874349Z In file included from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_base.h:1132,
2024-05-31T23:12:43.7875777Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:8,
2024-05-31T23:12:43.7876969Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec.h:4,
2024-05-31T23:12:43.7878188Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional_base.h:6,
2024-05-31T23:12:43.7879447Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional.h:3,
2024-05-31T23:12:43.7880709Z from /tmp/torchinductor_root/sk/cskh5dx62fglpphcrl6723dnmowdabouerrzy3dmqcngbxwfa7bv.h:35,
2024-05-31T23:12:43.7881848Z from /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:2:
2024-05-31T23:12:43.7884406Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_n.h:316:37: note: candidate: ‘template<class T, int N> at::vec::CPU_CAPABILITY::VectorizedN<T, N> at::vec::CPU_CAPABILITY::operator*(const at::vec::CPU_CAPABILITY::VectorizedN<T, N>&, const at::vec::CPU_CAPABILITY::VectorizedN<T, N>&)’
2024-05-31T23:12:43.7886385Z 316 | VECTORIZEDN_DEFINE_BINARY_OP_GLOBAL(operator*)
2024-05-31T23:12:43.7886877Z | ^~~~~~~~
2024-05-31T23:12:43.7888170Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_n.h:297:28: note: in definition of macro ‘VECTORIZEDN_DEFINE_BINARY_OP_GLOBAL’
2024-05-31T23:12:43.7889512Z 297 | inline VectorizedN<T, N> op( \
2024-05-31T23:12:43.7890106Z | ^~
2024-05-31T23:12:43.7891273Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_n.h:316:37: note: template argument deduction/substitution failed:
2024-05-31T23:12:43.7892428Z 316 | VECTORIZEDN_DEFINE_BINARY_OP_GLOBAL(operator*)
2024-05-31T23:12:43.7892924Z | ^~~~~~~~
2024-05-31T23:12:43.7894198Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_n.h:297:28: note: in definition of macro ‘VECTORIZEDN_DEFINE_BINARY_OP_GLOBAL’
2024-05-31T23:12:43.7895523Z 297 | inline VectorizedN<T, N> op( \
2024-05-31T23:12:43.7896112Z | ^~
2024-05-31T23:12:43.7897781Z /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:77:38: note: ‘at::vec::CPU_CAPABILITY::Vectorized<int>’ is not derived from ‘const at::vec::CPU_CAPABILITY::VectorizedN<T, N>’
2024-05-31T23:12:43.7899201Z 77 | auto tmp40 = tmp37 * tmp39;
2024-05-31T23:12:43.7899663Z | ^~~~~
2024-05-31T23:12:43.7900660Z In file included from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:10,
2024-05-31T23:12:43.7901934Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec.h:4,
2024-05-31T23:12:43.7903154Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional_base.h:6,
2024-05-31T23:12:43.7904587Z from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional.h:3,
2024-05-31T23:12:43.7921001Z from /tmp/torchinductor_root/sk/cskh5dx62fglpphcrl6723dnmowdabouerrzy3dmqcngbxwfa7bv.h:35,
2024-05-31T23:12:43.7922202Z from /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:2:
2024-05-31T23:12:43.7925013Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:790:29: note: candidate: ‘at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16> at::vec::CPU_CAPABILITY::operator*(const at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>&, const at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>&)’
2024-05-31T23:12:43.7927404Z 790 | Vectorized<BFloat16> inline operator*(const Vectorized<BFloat16>& a, const Vectorized<BFloat16>& b) {
2024-05-31T23:12:43.7928166Z | ^~~~~~~~
2024-05-31T23:12:43.7930114Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:790:67: note: no known conversion for argument 1 from ‘at::vec::CPU_CAPABILITY::Vectorized<int>’ to ‘const at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>&’
2024-05-31T23:12:43.7932167Z 790 | Vectorized<BFloat16> inline operator*(const Vectorized<BFloat16>& a, const Vectorized<BFloat16>& b) {
2024-05-31T23:12:43.7933196Z | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
2024-05-31T23:12:43.7935472Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:1389:25: note: candidate: ‘at::vec::CPU_CAPABILITY::Vectorized<c10::Half> at::vec::CPU_CAPABILITY::operator*(const at::vec::CPU_CAPABILITY::Vectorized<c10::Half>&, const at::vec::CPU_CAPABILITY::Vectorized<c10::Half>&)’
2024-05-31T23:12:43.7937710Z 1389 | Vectorized<Half> inline operator*(const Vectorized<Half>& a, const Vectorized<Half>& b) {
2024-05-31T23:12:43.7938422Z | ^~~~~~~~
2024-05-31T23:12:43.7940336Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:1389:59: note: no known conversion for argument 1 from ‘at::vec::CPU_CAPABILITY::Vectorized<int>’ to ‘const at::vec::CPU_CAPABILITY::Vectorized<c10::Half>&’
2024-05-31T23:12:43.7942298Z 1389 | Vectorized<Half> inline operator*(const Vectorized<Half>& a, const Vectorized<Half>& b) {
2024-05-31T23:12:43.7943051Z | ~~~~~~~~~~~~~~~~~~~~~~~~^
2024-05-31T23:12:43.7943515Z
2024-05-31T23:12:43.7943536Z
2024-05-31T23:12:43.7943823Z Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
2024-05-31T23:12:43.7944276Z
2024-05-31T23:12:43.7944280Z
2024-05-31T23:12:43.7944583Z You can suppress this exception and fall back to eager by setting:
2024-05-31T23:12:43.7945129Z import torch._dynamo
2024-05-31T23:12:43.7945526Z torch._dynamo.config.suppress_errors = True
2024-05-31T23:12:43.7946971Z FAILED test/integration/test_integration.py::TestSubclass::test_int8_dynamic_quant_subclass_api_2_cpu - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
2024-05-31T23:12:43.7948162Z CppCompileError: C++ compile error
BF16 passes but FP16 failed. From the lowering graph, we have saw the graph is already different between BF16 and FP16, is it a CPU specific issue @jerryzh168 ?
Here is the BF16 fx graph readable. We convert tensor to BF16 after clamp_min
Here is the FP16 fx graph readable. We convert tensor to FP32 after clamp_min
This line of code is suspicious in TorchAO https://github.com/pytorch/ao/blob/950a89388e88e10f26bbbbe2ec0b1710ba3d33d1/torchao/quantization/quant_api.py#L413, which hardcode the data type as None
for BF16
, but FP32
for for FP16
.
============== Update
Unify the data type to None
for both FP16 and BF16, this testcase passes on my local system.
this fails for both CPU and CUDA I think.
the linked detail is important to not regress the performance for some internal model I think, why can't inductor support this path?
Thanks for the remind, after further investigation, we do found a CPP Backend issue, https://github.com/pytorch/pytorch/pull/128498 to fix it. With this PR, I think this testcase works with CPP Backend now.
Hi @jerryzh168, I am going to close this issue as the fix landed. Please let me know if any further issues.
🐛 Describe the bug
see: https://github.com/pytorch/ao/pull/300
Versions
pytorch nightly
cc @ezyang @bdhirsh @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @msaroufim