Closed l9761116 closed 2 years ago
Please install the latest TorchPack:
pip install --upgrade git+https://github.com/zhijian-liu/torchpack.git
This should allow you to run the evaluation without torchpack dist-run
. Btw, I noticed that you installed the CPU-version PyTorch. Could you try installing the GPU version instead?
yes, thanks for the remind. I reinstall the GPU version and upgrade torchpack. But there's new problem saying ''' import torchsparse.backend ImportError: /home/anaconda3/envs/torch/lib/python3.7/site-packages/torchsparse/backend.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor6deviceEv
Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[34422,1],0] Exit code: 1
''' I have no idea about it.
Thanks for the update! Could you please also reinstall TorchSparse?
When I run the instruction "torchpack dist-run -np 1 python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs" , it stuck at "dist.init()" and there's no information output about it. Then I tried "python evaluate.py configs/semantic_kitti/default.yaml --name SemanticKITTI_val_SPVNAS@65GMACs --distributed False" but it reports ''' File "evaluate.py", line 24, in main dist.init() File "/home/anaconda3/envs/torch/lib/python3.7/site-packages/torchpack/distributed/context.py", line 23, in init master_host = 'tcp://' + os.environ['MASTER_HOST'] File "/home/anaconda3/envs/torch/lib/python3.7/os.py", line 681, in getitem raise KeyError(key) from None KeyError: 'MASTER_HOST' ''' Does anyone meet this problem? Is it an environment problem? Some of my packages are as follows: ''' cudatoolkit 11.3.1 h2bc3f7f_2 mpi 1.0 openmpi conda-forge mpi4py 3.1.3 pypi_0 pypi pytorch 1.7.0 py3.7_cpu_0 [cpuonly] pytorch torchpack 0.3.1 pypi_0 pypi torchsparse 1.4.0 pypi_0 pypi torchvision 0.8.1 py37_cpu [cpuonly] pytorch tqdm 4.63.0 pypi_0 pypi ''' I think the problem may be related to mpi (?) mpi4py or something. But I don't quite know about it. So does anyone know the solution?