mlcommons / inference_results_v1.1

This repository contains the results and code for the MLPerf™ Inference v1.1 benchmark.
https://mlcommons.org/en/inference-datacenter-11/
Apache License 2.0
11 stars 23 forks source link

Xavier Failed to compile Pytorch 1.4 #5

Open yzh89 opened 2 years ago

yzh89 commented 2 years ago

https://github.com/mlcommons/inference_results_v1.1/blob/afc47bcbb459494574b5f888475694fc0d30bfa7/closed/NVIDIA/scripts/install_xavier_dependencies.sh#L108

Xavier dependency have following errors. The Xavier I used started from a fresh installation with only pip installed for supporting python3.8 dependencies.

c++: internal compiler error: Segmentation fault (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
caffe2/CMakeFiles/torch.dir/build.make:2818: recipe for target 'caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/CPUType.cpp.o' failed
make[2]: *** [caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/CPUType.cpp.o] Error 4
make[2]: *** Waiting for unfinished jobs....
c++: internal compiler error: Segmentation fault (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
caffe2/CMakeFiles/torch.dir/build.make:2883: recipe for target 'caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/TypeDefault.cpp.o' failed
make[2]: *** [caffe2/CMakeFiles/torch.dir/__/aten/src/ATen/TypeDefault.cpp.o] Error 4
nvitramble commented 2 years ago

Could you try building with gcc-9.3? This comment indicates that using gcc-9.3 fixed a similar error.

If that doesn't work, here is a pre-built PyTorch tarball for Xavier. Could you try copying the contents of <untar_dir>/usr/local/lib/python3.8/dist-packages to /usr/local/lib/python3.8/dist-packages?

yzh89 commented 2 years ago

gcc-9.4 returns following errors. I noticed this error happened much earlier. I will try the tarball next. I understand there is prebuilt wheel for pytorch for Xavier. Hope this doesn't affect the bench results.

[ 47%] Building CXX object third_party/onnx/CMakeFiles/onnx_proto.dir/onnx/onnx-operators_onnx_torch-ml.pb.cc.o
c++: internal compiler error: Segmentation fault signal terminated program cc1plus
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-9/README.Bugs> for instructions.
c10/test/CMakeFiles/c10_intrusive_ptr_test.dir/build.make:81: recipe for target 'c10/test/CMakeFiles/c10_intrusive_ptr_test.dir/util/intrusive_ptr_test.cpp.o' failed
make[2]: *** [c10/test/CMakeFiles/c10_intrusive_ptr_test.dir/util/intrusive_ptr_test.cpp.o] Error 4
CMakeFiles/Makefile2:3362: recipe for target 'c10/test/CMakeFiles/c10_intrusive_ptr_test.dir/all' failed
make[1]: *** [c10/test/CMakeFiles/c10_intrusive_ptr_test.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 47%] Built target Caffe2_PROTO
[ 47%] Linking CXX static library ../../lib/libonnx_proto.a
[ 47%] Built target onnx_proto
Makefile:159: recipe for target 'all' failed
make: *** [all] Error 2
Traceback (most recent call last):
  File "setup.py", line 755, in <module>
    build_deps()
  File "setup.py", line 311, in build_deps
    build_caffe2(version=version,
  File "/tmp/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
    cmake.build(my_env)
  File "/tmp/pytorch/tools/setup_helpers/cmake.py", line 335, in build
    self.run(build_args, my_env)
  File "/tmp/pytorch/tools/setup_helpers/cmake.py", line 141, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '8']' returned non-zero exit status 2.
yzh89 commented 2 years ago

For the pre-built PyTorch. It seems it have different numpy version. Here is the error when importing torch.

import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 81, in <module>
    from torch._C import *
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd