Open amiltonwong opened 2 years ago
I think the fastest way is to update the version of pytorch, when CUDA>=11.0, pytorch>=1.7.x is the most suitable version (according to the official website lower version should not be supported to install), whether the accuracy receives the impact needs to be tested.
Hi, @MaxChanger
Then I use pytorch 1.7.0 version. However, I got the following error:
(pytorch1.7.0) root@milton-LabPC:/media/root/mdata/data/code13/MVP_Benchmark/completion# python train.py -c ./cfgs/pcn.yaml
INFO:root:Munch({'batch_size': 32, 'workers': 0, 'nepoch': 100, 'model_name': 'pcn', 'load_model': None, 'start_epoch': 0, 'num_points': 2048, 'work_dir': 'log/', 'flag': 'debug', 'loss': 'cd', 'manual_seed': None, 'use_mean_feature': False, 'step_interval_to_print': 500, 'epoch_interval_to_save': 1, 'epoch_interval_to_val': 1, 'varying_constant': '0.01, 0.1, 0.5, 1', 'varying_constant_epochs': '5, 15, 30', 'lr': 0.0001, 'lr_decay': True, 'lr_decay_interval': 40, 'lr_decay_rate': 0.7, 'lr_step_decay_epochs': None, 'lr_step_decay_rates': None, 'lr_clip': 1e-06, 'optimizer': 'Adam', 'weight_decay': 0, 'betas': '0.9, 0.999', 'save_vis': True, 'eval_emd': False})
(62400, 2048, 3)
(2400, 2048, 3) (62400,)
(41600, 2048, 3)
(1600, 2048, 3) (41600,)
INFO:root:Length of train dataset:62400
INFO:root:Length of test dataset:41600
INFO:root:Random Seed: 3648
Jitting Chamfer 3D
Traceback (most recent call last):
File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1516, in _run_ninja_build
subprocess.run(
File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "train.py", line 213, in <module>
train()
File "train.py", line 48, in train
model_module = importlib.import_module('.%s' % args.model_name, 'models')
File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 783, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/media/root/mdata/data/code13/MVP_Benchmark/completion/models/pcn.py", line 10, in <module>
from model_utils import gen_grid_up, calc_emd, calc_cd
File "/media/root/mdata/data/code13/MVP_Benchmark/completion/model_utils.py", line 20, in <module>
from metrics import cd, fscore, emd
File "../utils/metrics/__init__.py", line 1, in <module>
from .CD import (cd, fscore)
File "../utils/metrics/CD/__init__.py", line 1, in <module>
from .chamfer3D.dist_chamfer_3D import chamfer_3DDist as cd
File "../utils/metrics/CD/chamfer3D/dist_chamfer_3D.py", line 12, in <module>
chamfer_3D = load(name="chamfer_3D",
File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 969, in load
return _jit_compile(
File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1176, in _jit_compile
_write_ninja_file_and_build_library(
File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1280, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1538, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'chamfer_3D': [1/2] /usr/local/cuda-11.0/bin/nvcc -DTORCH_EXTENSION_NAME=chamfer_3D -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.0/include -isystem /root/anaconda3/envs/pytorch1.7.0/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++14 -c /media/root/mdata/data/code13/MVP_Benchmark/utils/metrics/CD/chamfer3D/chamfer3D.cu -o chamfer3D.cuda.o
FAILED: chamfer3D.cuda.o
/usr/local/cuda-11.0/bin/nvcc -DTORCH_EXTENSION_NAME=chamfer_3D -DTORCH_API_INCLUDE_EXTENSION_H -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include/TH -isystem /root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.0/include -isystem /root/anaconda3/envs/pytorch1.7.0/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -std=c++14 -c /media/root/mdata/data/code13/MVP_Benchmark/utils/metrics/CD/chamfer3D/chamfer3D.cu -o chamfer3D.cuda.o
nvcc fatal : Unsupported gpu architecture 'compute_86'
ninja: build stopped: subcommand failed.
(pytorch1.7.0) root@milton-LabPC:/media/root/mdata/data/code13/MVP_Benchmark/completion
Any hints to fix this issue? Thanks~
Hello, I'm not sure if you used 3080 or 3090. I search and found some feasible solutions, DeepSpeed/issues/607, pytorch/issues/45021, pytorch/issues/45028. In short, maybe the current CUDA computing power is higher than the CUDA computing power supported by the current PyTorch version. Hope these can help you.
you need to be on the latest 1.9+ pytorch build.. anything older for me doesn't run on my 3090 RTX
Hi, @paul007pl ,
According to the setup.sh, CUDA 10.1 is used. However, my GPU (RTX 30xx series) only support CUDA >=11.0. And the error output after inputing command
python train.py -c ./cfgs/pcn.yaml
is displayed as follows, it related to the CUDA version problem.Any suggestions to fix this issue?
Thanks~