merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error'

Chongjie-Si commented 1 year ago

Thanks for your code! I have encountered an error when training:

Traceback (most recent call last): File "ddp_train.py", line 116, in main() File "ddp_train.py", line 107, in main trainer.train() File "/home/Point_Cloud/CPCM/trainer/base.py", line 190, in train self.train_one_epoch() File "/home/Point_Cloud/CPCM/trainer/fully_supervised_trainer.py", line 332, in train_one_epoch step_ret = self.step(batch) File "/home/Point_Cloud/CPCM/trainer/fully_supervised_trainer.py", line 1232, in step return self._step_two_and_mask_stream(batch=batch) File "/home/Point_Cloud/CPCM/trainer/fully_supervised_trainer.py", line 1199, in _step_two_and_mask_stream loss.backward() File "/home/.conda/envs/seg/lib/python3.8/site-packages/torch/_tensor.py", line 307, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/.conda/envs/seg/lib/python3.8/site-packages/torch/autograd/init.py", line 154, in backward Variable._execution_engine.run_backward( RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered terminate called after throwing an instance of 'c10::Error'

How can I fix this? I tried this https://github.com/taesungp/contrastive-unpaired-translation/issues/83 but did not success.

xiaoxunlong commented 1 year ago

May I know the details of your computer's hardware components, such as the processor, RAM, storage, and graphics card? Moreover, could you please provide conda environment and gcc version?

Chongjie-Si commented 1 year ago

CUDA 11.1, torch 1.10.1, RTX 3090 24GB, python 3.8.16, gcc 9.4.0

xiaoxunlong commented 1 year ago

I followed the instructions of me054 to setup environment and I downloaded the preprocess s3dis dataset provided by authors in README.md. And I used the command provided by authors in README.md to run experiment. Everything is going smoothly. Have you follow all the instructions provided by authors? My system environment is CUDA 11.7, 3090 24GB, gcc 11.3.0.

Chongjie-Si commented 1 year ago

Thank you for your comments. I think there were somethings wrong with my environment. I tried to install torch 1.9.0 and everything works fine with me now.