pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.45k stars 22.52k forks source link

Error installing from source #62168

Open angusfong opened 3 years ago

angusfong commented 3 years ago

I am building PyTorch from source on CUDA following https://github.com/pytorch/pytorch#from-source.

uname -a: Linux ares 5.8.0-59-generic #66~20.04.1-Ubuntu SMP Thu Jun 17 11:14:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

cuda version: 11.1

When I run python setup.py install, I get the following errors:

/home/angus/pytorch/torch/csrc/jit/ir/ir.cpp: In member function ‘bool torch::jit::Node::hasSideEffects() const’:
/home/angus/pytorch/torch/csrc/jit/ir/ir.cpp:1143:16: error: ‘set_stream’ is not a member of ‘torch::jit::cuda’; did you mean ‘c10::cuda::set_stream’?
 1143 |     case cuda::set_stream:
      |                ^~~~~~~~~~
In file included from /home/angus/pytorch/aten/src/ATen/core/Dimname.h:3,
                 from /home/angus/pytorch/aten/src/ATen/core/NamedTensor.h:3,
                 from /home/angus/pytorch/build/aten/src/ATen/core/TensorBody.h:24,
                 from /home/angus/pytorch/aten/src/ATen/Tensor.h:3,
                 from /home/angus/pytorch/aten/src/ATen/Context.h:4,
                 from /home/angus/pytorch/aten/src/ATen/ATen.h:9,
                 from /home/angus/pytorch/torch/csrc/jit/ir/attributes.h:2,
                 from /home/angus/pytorch/torch/csrc/jit/ir/ir.h:3,
                 from /home/angus/pytorch/torch/csrc/jit/ir/ir.cpp:1:
/home/angus/pytorch/aten/src/ATen/core/interned_strings.h:317:11: note: ‘c10::cuda::set_stream’ declared here
  317 |   _(cuda, set_stream)                \
      |           ^~~~~~~~~~
/home/angus/pytorch/aten/src/ATen/core/interned_strings.h:595:35: note: in definition of macro ‘DEFINE_SYMBOL’
  595 |   namespace ns { constexpr Symbol s(static_cast<unique_t>(_keys::ns##_##s)); }
      |                                   ^
/home/angus/pytorch/aten/src/ATen/core/interned_strings.h:596:1: note: in expansion of macro ‘FORALL_NS_SYMBOLS’
  596 | FORALL_NS_SYMBOLS(DEFINE_SYMBOL)

I tried adding #include <torch/script.h> to the top of /home/angus/pytorch/torch/csrc/jit/ir/ir, but the error persists.

Could someone explain what is going on here?

cc @malfet @seemethere @walterddr

walterddr commented 3 years ago

could you please run torch/utils/collect_env.py and paste the results here.

hypevr-z commented 2 years ago

Any update regarding this? I have encountered the same issue for building the PyTorch from source.

HazelSh commented 2 years ago

I have the same issue. Build fails with this error:

[5418/6341] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/ir.cpp.o
FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/ir.cpp.o 
/usr/bin/ccache /usr/bin/c++ -DAT_PER_OPERATOR_HEADERS -DBUILD_ONEDNN_GRAPH -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/home/hazel/code/torch/pytorch/build/aten/src -I/home/hazel/code/torch/pytorch/aten/src -I/home/hazel/code/torch/pytorch/build -I/home/hazel/code/torch/pytorch -I/home/hazel/code/torch/pytorch/cmake/../third_party/benchmark/include -I/home/hazel/code/torch/pytorch/third_party/onnx -I/home/hazel/code/torch/pytorch/build/third_party/onnx -I/home/hazel/code/torch/pytorch/third_party/foxi -I/home/hazel/code/torch/pytorch/build/third_party/foxi -I/home/hazel/code/torch/pytorch/torch/csrc/api -I/home/hazel/code/torch/pytorch/torch/csrc/api/include -I/home/hazel/code/torch/pytorch/caffe2/aten/src/TH -I/home/hazel/code/torch/pytorch/build/caffe2/aten/src/TH -I/home/hazel/code/torch/pytorch/build/caffe2/aten/src -I/home/hazel/code/torch/pytorch/build/caffe2/../aten/src -I/home/hazel/code/torch/pytorch/torch/csrc -I/home/hazel/code/torch/pytorch/third_party/miniz-2.1.0 -I/home/hazel/code/torch/pytorch/third_party/kineto/libkineto/include -I/home/hazel/code/torch/pytorch/third_party/kineto/libkineto/src -I/home/hazel/code/torch/pytorch/aten/../third_party/catch/single_include -I/home/hazel/code/torch/pytorch/aten/src/ATen/.. -I/home/hazel/code/torch/pytorch/third_party/FXdiv/include -I/home/hazel/code/torch/pytorch/c10/.. -I/home/hazel/code/torch/pytorch/third_party/pthreadpool/include -I/home/hazel/code/torch/pytorch/third_party/cpuinfo/include -I/home/hazel/code/torch/pytorch/third_party/QNNPACK/include -I/home/hazel/code/torch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/home/hazel/code/torch/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/home/hazel/code/torch/pytorch/third_party/cpuinfo/deps/clog/include -I/home/hazel/code/torch/pytorch/third_party/NNPACK/include -I/home/hazel/code/torch/pytorch/third_party/fbgemm/include -I/home/hazel/code/torch/pytorch/third_party/fbgemm -I/home/hazel/code/torch/pytorch/third_party/fbgemm/third_party/asmjit/src -I/home/hazel/code/torch/pytorch/third_party/ittapi/src/ittnotify -I/home/hazel/code/torch/pytorch/third_party/FP16/include -I/home/hazel/code/torch/pytorch/third_party/tensorpipe -I/home/hazel/code/torch/pytorch/build/third_party/tensorpipe -I/home/hazel/code/torch/pytorch/third_party/tensorpipe/third_party/libnop/include -I/home/hazel/code/torch/pytorch/third_party/fmt/include -I/home/hazel/code/torch/pytorch/build/third_party/ideep/mkl-dnn/third_party/oneDNN/include -I/home/hazel/code/torch/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/src/../include -I/home/hazel/code/torch/pytorch/third_party/flatbuffers/include -isystem /home/hazel/code/torch/pytorch/build/third_party/gloo -isystem /home/hazel/code/torch/pytorch/cmake/../third_party/gloo -isystem /home/hazel/code/torch/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/hazel/code/torch/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/hazel/code/torch/pytorch/third_party/protobuf/src -isystem /home/hazel/code/torch/pytorch/third_party/gemmlowp -isystem /home/hazel/code/torch/pytorch/third_party/neon2sse -isystem /home/hazel/code/torch/pytorch/third_party/XNNPACK/include -isystem /home/hazel/code/torch/pytorch/third_party/ittapi/include -isystem /home/hazel/code/torch/pytorch/cmake/../third_party/eigen -isystem /home/hazel/code/torch/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem /home/hazel/code/torch/pytorch/third_party/ideep/include -isystem /home/hazel/code/torch/pytorch/third_party/ideep/mkl-dnn/include -isystem /home/hazel/code/torch/pytorch/build/include -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -DHAVE_AVX2_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -fPIC -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-missing-field-initializers -Wno-write-strings -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-sign-compare -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-missing-braces -Wno-maybe-uninitialized -fvisibility=hidden -O2 -fopenmp -DCAFFE2_BUILD_MAIN_LIB -pthread -DASMJIT_STATIC -std=gnu++14 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/ir.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/ir.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/ir/ir.cpp.o -c /home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.cpp
/home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.cpp: In member function ‘bool torch::jit::Node::hasSideEffects() const’:
/home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.cpp:1189:16: error: ‘set_stream’ is not a member of ‘torch::jit::cuda’; did you mean ‘c10::cuda::set_stream’?
 1189 |     case cuda::set_stream:
      |                ^~~~~~~~~~
In file included from /home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.h:18,
                 from /home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.cpp:1:
/home/hazel/code/torch/pytorch/aten/src/ATen/core/interned_strings.h:221:11: note: ‘c10::cuda::set_stream’ declared here
  221 |   _(cuda, set_stream)                \
      |           ^~~~~~~~~~
/home/hazel/code/torch/pytorch/aten/src/ATen/core/interned_strings.h:344:35: note: in definition of macro ‘DEFINE_SYMBOL’
  344 |   namespace ns { constexpr Symbol s(static_cast<unique_t>(_keys::ns##_##s)); }
      |                                   ^
/home/hazel/code/torch/pytorch/aten/src/ATen/core/interned_strings.h:345:1: note: in expansion of macro ‘FORALL_NS_SYMBOLS’
  345 | FORALL_NS_SYMBOLS(DEFINE_SYMBOL)
      | ^~~~~~~~~~~~~~~~~
/home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.cpp:1190:16: error: ‘_set_device’ is not a member of ‘torch::jit::cuda’; did you mean ‘c10::cuda::_set_device’?
 1190 |     case cuda::_set_device:
      |                ^~~~~~~~~~~
In file included from /home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.h:18,
                 from /home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.cpp:1:
/home/hazel/code/torch/pytorch/aten/src/ATen/core/interned_strings.h:220:11: note: ‘c10::cuda::_set_device’ declared here
  220 |   _(cuda, _set_device)               \
      |           ^~~~~~~~~~~
/home/hazel/code/torch/pytorch/aten/src/ATen/core/interned_strings.h:344:35: note: in definition of macro ‘DEFINE_SYMBOL’
  344 |   namespace ns { constexpr Symbol s(static_cast<unique_t>(_keys::ns##_##s)); }
      |                                   ^
/home/hazel/code/torch/pytorch/aten/src/ATen/core/interned_strings.h:345:1: note: in expansion of macro ‘FORALL_NS_SYMBOLS’
  345 | FORALL_NS_SYMBOLS(DEFINE_SYMBOL)
      | ^~~~~~~~~~~~~~~~~
/home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.cpp:1191:16: error: ‘_current_device’ is not a member of ‘torch::jit::cuda’; did you mean ‘c10::cuda::_current_device’?
 1191 |     case cuda::_current_device:
      |                ^~~~~~~~~~~~~~~
In file included from /home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.h:18,
                 from /home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.cpp:1:
/home/hazel/code/torch/pytorch/aten/src/ATen/core/interned_strings.h:222:11: note: ‘c10::cuda::_current_device’ declared here
  222 |   _(cuda, _current_device)           \
      |           ^~~~~~~~~~~~~~~
/home/hazel/code/torch/pytorch/aten/src/ATen/core/interned_strings.h:344:35: note: in definition of macro ‘DEFINE_SYMBOL’
  344 |   namespace ns { constexpr Symbol s(static_cast<unique_t>(_keys::ns##_##s)); }
      |                                   ^
/home/hazel/code/torch/pytorch/aten/src/ATen/core/interned_strings.h:345:1: note: in expansion of macro ‘FORALL_NS_SYMBOLS’
  345 | FORALL_NS_SYMBOLS(DEFINE_SYMBOL)
      | ^~~~~~~~~~~~~~~~~
/home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.cpp:1192:16: error: ‘synchronize’ is not a member of ‘torch::jit::cuda’; did you mean ‘c10::cuda::synchronize’?
 1192 |     case cuda::synchronize:
      |                ^~~~~~~~~~~
In file included from /home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.h:18,
                 from /home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.cpp:1:
/home/hazel/code/torch/pytorch/aten/src/ATen/core/interned_strings.h:223:11: note: ‘c10::cuda::synchronize’ declared here
  223 |   _(cuda, synchronize)               \
      |           ^~~~~~~~~~~
/home/hazel/code/torch/pytorch/aten/src/ATen/core/interned_strings.h:344:35: note: in definition of macro ‘DEFINE_SYMBOL’
  344 |   namespace ns { constexpr Symbol s(static_cast<unique_t>(_keys::ns##_##s)); }
      |                                   ^
/home/hazel/code/torch/pytorch/aten/src/ATen/core/interned_strings.h:345:1: note: in expansion of macro ‘FORALL_NS_SYMBOLS’
  345 | FORALL_NS_SYMBOLS(DEFINE_SYMBOL)
      | ^~~~~~~~~~~~~~~~~
[5423/6341] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/frontend/ir_emitter.cpp.o
ninja: build stopped: subcommand failed.

So there's errors that look like this:

/home/hazel/code/torch/pytorch/torch/csrc/jit/ir/ir.cpp:1192:16: error: ‘synchronize’ is not a member of ‘torch::jit::cuda’; did you mean ‘c10::cuda::synchronize’?

... but with different symbols (syncronize, _current_device, set_stream, _set_device).

I'm assuming it's a namespace clash issue of some kind, in that file. I know just barely enough cpp that I maybe could try specifying the names more fully in that file? But I'm a bit scared to touch it on a project this large in a language I don't speak, honestly.

Output of torch/utils/collect_env.py is:

PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Linux Mint 21 (x86_64)
GCC version: (Ubuntu 11.2.0-19ubuntu1) 11.2.0
Clang version: Could not collect
CMake version: version 3.24.1
Libc version: glibc-2.35

Python version: 3.10.6 (main, Aug 10 2022, 11:40:04) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-48-generic-x86_64-with-glibc2.35
Is CUDA available: N/A
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.5
[conda] Could not collect

If it's relevant, I'm trying for a ROCm build and ran tools/amd_build/build_amd.py first.

HazelSh commented 2 years ago

I think this and https://github.com/pytorch/pytorch/issues/69286 might be duplicates, btw. Which makes at least four separate reports of this error.

nicholas-sly commented 1 year ago

I have also hit this issue a number of times when trying to build a containerized pytorch on Alma Linux with ROCm support. Any guidance on this issue would be a huge help.