pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
84.07k stars 22.66k forks source link

torchinductor error in torchao tests #128263

Closed jerryzh168 closed 5 months ago

jerryzh168 commented 5 months ago

🐛 Describe the bug

see: https://github.com/pytorch/ao/pull/300

Versions

pytorch nightly

cc @ezyang @bdhirsh @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @msaroufim

ezyang commented 5 months ago

for reference the error is

2024-05-31T23:12:43.7830534Z =========================== short test summary info ============================
2024-05-31T23:12:43.7832053Z FAILED test/integration/test_integration.py::TestSubclass::test_int8_dynamic_quant_subclass_api_1_cpu - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
2024-05-31T23:12:43.7833246Z CppCompileError: C++ compile error
2024-05-31T23:12:43.7833517Z 
2024-05-31T23:12:43.7833621Z Command:
2024-05-31T23:12:43.7840753Z g++ /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas -D_GLIBCXX_USE_CXX11_ABI=0 -I/opt/conda/envs/venv/lib/python3.9/site-packages/torch/include -I/opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/TH -I/opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/THC -I/opt/conda/envs/venv/include/python3.9 -L/opt/conda/envs/venv/lib/python3.9/site-packages/torch/lib -L/opt/conda/envs/venv/lib -L/opt/conda/envs/venv/lib/python3.9/site-packages/torch/lib -ltorch -ltorch_cpu -lgomp -ltorch_python -lc10 -mavx512f -mavx512dq -mavx512vl -mavx512bw -mfma -DCPU_CAPABILITY_AVX512 -O3 -DNDEBUG -ffast-math -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -march=native -fopenmp -D C10_USING_CUSTOM_GENERATED_MACROS -o /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.so
2024-05-31T23:12:43.7847453Z 
2024-05-31T23:12:43.7847561Z Output:
2024-05-31T23:12:43.7848972Z /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp: In function ‘void kernel(half*, const half*, const int8_t*, const int64_t*, const half*, half*, half*, half*, long int)’:
2024-05-31T23:12:43.7851691Z /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:77:36: error: no match for ‘operator*’ (operand types are ‘at::vec::CPU_CAPABILITY::Vectorized<int>’ and ‘at::vec::CPU_CAPABILITY::Vectorized<float>’)
2024-05-31T23:12:43.7853239Z    77 |                 auto tmp40 = tmp37 * tmp39;
2024-05-31T23:12:43.7853703Z       |                              ~~~~~ ^ ~~~~~
2024-05-31T23:12:43.7854122Z       |                              |       |
2024-05-31T23:12:43.7854580Z       |                              |       Vectorized<float>
2024-05-31T23:12:43.7855061Z       |                              Vectorized<int>
2024-05-31T23:12:43.7856049Z In file included from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:8,
2024-05-31T23:12:43.7857334Z                  from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec.h:4,
2024-05-31T23:12:43.7858557Z                  from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional_base.h:6,
2024-05-31T23:12:43.7859815Z                  from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional.h:3,
2024-05-31T23:12:43.7861053Z                  from /tmp/torchinductor_root/sk/cskh5dx62fglpphcrl6723dnmowdabouerrzy3dmqcngbxwfa7bv.h:35,
2024-05-31T23:12:43.7862212Z                  from /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:2:
2024-05-31T23:12:43.7865215Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_base.h:629:41: note: candidate: ‘template<class T> at::vec::CPU_CAPABILITY::Vectorized<T> at::vec::CPU_CAPABILITY::operator*(const at::vec::CPU_CAPABILITY::Vectorized<T>&, const at::vec::CPU_CAPABILITY::Vectorized<T>&)’
2024-05-31T23:12:43.7867719Z   629 | template <class T> Vectorized<T> inline operator*(const Vectorized<T> &a, const Vectorized<T> &b) {
2024-05-31T23:12:43.7868487Z       |                                         ^~~~~~~~
2024-05-31T23:12:43.7869739Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_base.h:629:41: note:   template argument deduction/substitution failed:
2024-05-31T23:12:43.7871736Z /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:77:38: note:   deduced conflicting types for parameter ‘T’ (‘int’ and ‘float’)
2024-05-31T23:12:43.7872918Z    77 |                 auto tmp40 = tmp37 * tmp39;
2024-05-31T23:12:43.7873378Z       |                                      ^~~~~
2024-05-31T23:12:43.7874349Z In file included from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_base.h:1132,
2024-05-31T23:12:43.7875777Z                  from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:8,
2024-05-31T23:12:43.7876969Z                  from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec.h:4,
2024-05-31T23:12:43.7878188Z                  from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional_base.h:6,
2024-05-31T23:12:43.7879447Z                  from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional.h:3,
2024-05-31T23:12:43.7880709Z                  from /tmp/torchinductor_root/sk/cskh5dx62fglpphcrl6723dnmowdabouerrzy3dmqcngbxwfa7bv.h:35,
2024-05-31T23:12:43.7881848Z                  from /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:2:
2024-05-31T23:12:43.7884406Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_n.h:316:37: note: candidate: ‘template<class T, int N> at::vec::CPU_CAPABILITY::VectorizedN<T, N> at::vec::CPU_CAPABILITY::operator*(const at::vec::CPU_CAPABILITY::VectorizedN<T, N>&, const at::vec::CPU_CAPABILITY::VectorizedN<T, N>&)’
2024-05-31T23:12:43.7886385Z   316 | VECTORIZEDN_DEFINE_BINARY_OP_GLOBAL(operator*)
2024-05-31T23:12:43.7886877Z       |                                     ^~~~~~~~
2024-05-31T23:12:43.7888170Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_n.h:297:28: note: in definition of macro ‘VECTORIZEDN_DEFINE_BINARY_OP_GLOBAL’
2024-05-31T23:12:43.7889512Z   297 |   inline VectorizedN<T, N> op(                                                 \
2024-05-31T23:12:43.7890106Z       |                            ^~
2024-05-31T23:12:43.7891273Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_n.h:316:37: note:   template argument deduction/substitution failed:
2024-05-31T23:12:43.7892428Z   316 | VECTORIZEDN_DEFINE_BINARY_OP_GLOBAL(operator*)
2024-05-31T23:12:43.7892924Z       |                                     ^~~~~~~~
2024-05-31T23:12:43.7894198Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec_n.h:297:28: note: in definition of macro ‘VECTORIZEDN_DEFINE_BINARY_OP_GLOBAL’
2024-05-31T23:12:43.7895523Z   297 |   inline VectorizedN<T, N> op(                                                 \
2024-05-31T23:12:43.7896112Z       |                            ^~
2024-05-31T23:12:43.7897781Z /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:77:38: note:   ‘at::vec::CPU_CAPABILITY::Vectorized<int>’ is not derived from ‘const at::vec::CPU_CAPABILITY::VectorizedN<T, N>’
2024-05-31T23:12:43.7899201Z    77 |                 auto tmp40 = tmp37 * tmp39;
2024-05-31T23:12:43.7899663Z       |                                      ^~~~~
2024-05-31T23:12:43.7900660Z In file included from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512.h:10,
2024-05-31T23:12:43.7901934Z                  from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec.h:4,
2024-05-31T23:12:43.7903154Z                  from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional_base.h:6,
2024-05-31T23:12:43.7904587Z                  from /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/functional.h:3,
2024-05-31T23:12:43.7921001Z                  from /tmp/torchinductor_root/sk/cskh5dx62fglpphcrl6723dnmowdabouerrzy3dmqcngbxwfa7bv.h:35,
2024-05-31T23:12:43.7922202Z                  from /tmp/torchinductor_root/ag/cag5cdm6uh26pig7xgkfwgbqqh377nc7ldeqen544wvh5totpuza.cpp:2:
2024-05-31T23:12:43.7925013Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:790:29: note: candidate: ‘at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16> at::vec::CPU_CAPABILITY::operator*(const at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>&, const at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>&)’
2024-05-31T23:12:43.7927404Z   790 | Vectorized<BFloat16> inline operator*(const Vectorized<BFloat16>& a, const Vectorized<BFloat16>& b) {
2024-05-31T23:12:43.7928166Z       |                             ^~~~~~~~
2024-05-31T23:12:43.7930114Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:790:67: note:   no known conversion for argument 1 from ‘at::vec::CPU_CAPABILITY::Vectorized<int>’ to ‘const at::vec::CPU_CAPABILITY::Vectorized<c10::BFloat16>&’
2024-05-31T23:12:43.7932167Z   790 | Vectorized<BFloat16> inline operator*(const Vectorized<BFloat16>& a, const Vectorized<BFloat16>& b) {
2024-05-31T23:12:43.7933196Z       |                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
2024-05-31T23:12:43.7935472Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:1389:25: note: candidate: ‘at::vec::CPU_CAPABILITY::Vectorized<c10::Half> at::vec::CPU_CAPABILITY::operator*(const at::vec::CPU_CAPABILITY::Vectorized<c10::Half>&, const at::vec::CPU_CAPABILITY::Vectorized<c10::Half>&)’
2024-05-31T23:12:43.7937710Z  1389 | Vectorized<Half> inline operator*(const Vectorized<Half>& a, const Vectorized<Half>& b) {
2024-05-31T23:12:43.7938422Z       |                         ^~~~~~~~
2024-05-31T23:12:43.7940336Z /opt/conda/envs/venv/lib/python3.9/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_bfloat16.h:1389:59: note:   no known conversion for argument 1 from ‘at::vec::CPU_CAPABILITY::Vectorized<int>’ to ‘const at::vec::CPU_CAPABILITY::Vectorized<c10::Half>&’
2024-05-31T23:12:43.7942298Z  1389 | Vectorized<Half> inline operator*(const Vectorized<Half>& a, const Vectorized<Half>& b) {
2024-05-31T23:12:43.7943051Z       |                                   ~~~~~~~~~~~~~~~~~~~~~~~~^
2024-05-31T23:12:43.7943515Z 
2024-05-31T23:12:43.7943536Z 
2024-05-31T23:12:43.7943823Z Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
2024-05-31T23:12:43.7944276Z 
2024-05-31T23:12:43.7944280Z 
2024-05-31T23:12:43.7944583Z You can suppress this exception and fall back to eager by setting:
2024-05-31T23:12:43.7945129Z     import torch._dynamo
2024-05-31T23:12:43.7945526Z     torch._dynamo.config.suppress_errors = True
2024-05-31T23:12:43.7946971Z FAILED test/integration/test_integration.py::TestSubclass::test_int8_dynamic_quant_subclass_api_2_cpu - torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
2024-05-31T23:12:43.7948162Z CppCompileError: C++ compile error
leslie-fang-intel commented 5 months ago

BF16 passes but FP16 failed. From the lowering graph, we have saw the graph is already different between BF16 and FP16, is it a CPU specific issue @jerryzh168 ?

leslie-fang-intel commented 5 months ago

This line of code is suspicious in TorchAO https://github.com/pytorch/ao/blob/950a89388e88e10f26bbbbe2ec0b1710ba3d33d1/torchao/quantization/quant_api.py#L413, which hardcode the data type as None for BF16, but FP32 for for FP16.

============== Update

Unify the data type to None for both FP16 and BF16, this testcase passes on my local system.

jerryzh168 commented 5 months ago

this fails for both CPU and CUDA I think.

the linked detail is important to not regress the performance for some internal model I think, why can't inductor support this path?

leslie-fang-intel commented 5 months ago

Thanks for the remind, after further investigation, we do found a CPP Backend issue, https://github.com/pytorch/pytorch/pull/128498 to fix it. With this PR, I think this testcase works with CPP Backend now.

leslie-fang-intel commented 5 months ago

Hi @jerryzh168, I am going to close this issue as the fix landed. Please let me know if any further issues.