visionml / pytracking

Visual tracking library based on PyTorch.
GNU General Public License v3.0
3.19k stars 603 forks source link

Error building extension '_prroi_pooling' #420

Open hhhyyyqqq opened 7 months ago

hhhyyyqqq commented 7 months ago

the whole error messages are as follows:

/home/ai1015/anaconda3/envs/pytracking/bin/python /home/ai1015/pytracking/ltr/run_training.py dimp dimp18 Training: dimp dimp18 WARNING: You are using tensorboardX instead sis you have a too old pytorch version. No matching checkpoint file found Using /home/ai1015/.cache/torch_extensions as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/ai1015/.cache/torch_extensions/_prroi_pooling/build.ninja... Building extension module _prroi_pooling... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] /usr/bin/nvcc -DTORCH_EXTENSION_NAME=_prroi_pooling -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include -isystem /home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/TH -isystem /home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/THC -isystem /home/ai1015/anaconda3/envs/pytracking/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++14 -c /home/ai1015/pytracking/ltr/external/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu_impl.cu -o prroi_pooling_gpu_impl.cuda.o FAILED: prroi_pooling_gpu_impl.cuda.o /usr/bin/nvcc -DTORCH_EXTENSION_NAME=_prroi_pooling -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include -isystem /home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/TH -isystem /home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/THC -isystem /home/ai1015/anaconda3/envs/pytracking/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++14 -c /home/ai1015/pytracking/ltr/external/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu_impl.cu -o prroi_pooling_gpu_impl.cuda.o nvcc fatal : Unsupported gpu architecture 'compute_86' [2/3] c++ -MMD -MF prroi_pooling_gpu.o.d -DTORCH_EXTENSION_NAME=_prroi_pooling -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include -isystem /home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/TH -isystem /home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/include/THC -isystem /home/ai1015/anaconda3/envs/pytracking/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /home/ai1015/pytracking/ltr/external/PreciseRoIPooling/pytorch/prroi_pool/src/prroi_pooling_gpu.c -o prroi_pooling_gpu.o ninja: build stopped: subcommand failed. Training crashed at epoch 1 Traceback for the error! Traceback (most recent call last): File "/home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1539, in _run_ninja_build env=env) File "/home/ai1015/anaconda3/envs/pytracking/lib/python3.7/subprocess.py", line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ai1015/pytracking/ltr/trainers/base_trainer.py", line 70, in train self.train_epoch() File "/home/ai1015/pytracking/ltr/trainers/ltr_trainer.py", line 97, in train_epoch self.cycle_dataset(loader) File "/home/ai1015/pytracking/ltr/trainers/ltr_trainer.py", line 75, in cycle_dataset loss, stats = self.actor(data) File "/home/ai1015/pytracking/ltr/actors/tracking.py", line 28, in call test_proposals=data['test_proposals']) File "/home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/ai1015/pytracking/ltr/models/tracking/dimpnet.py", line 60, in forward target_scores = self.classifier(train_feat_clf, test_feat_clf, train_bb, *args, *kwargs) File "/home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/ai1015/pytracking/ltr/models/target_classifier/linear_filter.py", line 57, in forward filter, filter_iter, losses = self.get_filter(train_feat, train_bb, *args, kwargs) File "/home/ai1015/pytracking/ltr/models/target_classifier/linear_filter.py", line 94, in get_filter weights = self.filter_initializer(feat, bb) File "/home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/ai1015/pytracking/ltr/models/target_classifier/initializer.py", line 164, in forward weights = self.filter_pool(feat, bb) File "/home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/ai1015/pytracking/ltr/models/target_classifier/initializer.py", line 45, in forward return self.prroi_pool(feat, roi1) File "/home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/ai1015/pytracking/ltr/external/PreciseRoIPooling/pytorch/prroi_pool/prroi_pool.py", line 28, in forward return prroi_pool2d(features, rois, self.pooled_height, self.pooled_width, self.spatial_scale) File "/home/ai1015/pytracking/ltr/external/PreciseRoIPooling/pytorch/prroi_pool/functional.py", line 44, in forward _prroi_pooling = _import_prroi_pooling() File "/home/ai1015/pytracking/ltr/external/PreciseRoIPooling/pytorch/prroi_pool/functional.py", line 33, in _import_prroi_pooling verbose=True File "/home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 997, in load keep_intermediates=keep_intermediates) File "/home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1202, in _jit_compile with_cuda=with_cuda) File "/home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1300, in _write_ninja_file_and_build_library error_prefix="Error building extension '{}'".format(name)) File "/home/ai1015/anaconda3/envs/pytracking/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1555, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension '_prroi_pooling'

my environment is GPU 4090, nvcc 10.1, torch 1.7.1+cu110.

I have learn about other issues about the errors.

theirs solutions involve changing the version of torch. But I have try torch 1.7.1 1.7.0 1.13.0 and so on. don't solve the problem. Because of GPU 4090, my cuda must be 11.0 and higher version. If i try cuda 10.1, I will encounter another error message, CUDA error: no kernel image is available for execution on the device, which is regarding the mismatch between GPU and CUDA.

also, the other solution is about using git clone the PreciseRoIPooling. I have tried it and my code about PreciseRoIPooling is symbolic link.

from asking GPT about my error message, the key is nvcc fatal: Unsupported gpu architecture 'compute_86'. I try some methods from internet solving the problem, it is useless.

I really don't know what is the problem and how to solve it. does anyone have the same problem and can give some advises? thank you very much.

AbdallahOmarAhmed commented 5 months ago

did you solve it

hhhyyyqqq commented 5 months ago

no, i don't solve it

AbdallahOmarAhmed commented 5 months ago

if you are still interested i fix it : https://github.com/visionml/pytracking/issues/404#top