Closed sylyt62 closed 2 years ago
I downgraded my cuda version to 11.1, then met other errors while testing:
test_backward (test_octree2col.Octree2ColTest) ... /media/yangtian/SATA3/PyEnvs/ubuntu/ocnn-py37-env/lib/python3.7/site-packages/torch/autograd/gradcheck.py:633: UserWarning: Input #0 requires gradient and is not a double precision floating point or complex. This check will likely fail if all the inputs are not of double precision floating point or complex. f'Input #{idx} requires gradient and ' ok test_forward (test_octree2col.Octree2ColTest) ... ok test_forwardP1 (test_octree2col.Octree2ColTest) ... ok test_octree2colP (test_octree2col.Octree2ColTest) ... ok test_forward_backward1 (test_octree_align.OctreeAlignTest) ... ok test_forward_backward2 (test_octree_align.OctreeAlignTest) ... ok test_forward_backward3 (test_octree_align.OctreeAlignTest) ... ok test_forward_and_backward (test_octree_conv.OctreeConvTest) ... ERROR test_forward_and_backward (test_octree_deconv.OctreeDeconvTest) ... ERROR test_decode_encode_key (test_octree_key.OctreeKeyTest) ... ERROR test_search_key (test_octree_key.OctreeKeyTest) ... ERROR test_xyz_key (test_octree_key.OctreeKeyTest) ... ERROR test_xyz_key_64 (test_octree_key.OctreeKeyTest) ... ERROR test_forward_and_backward_avg_pool (test_octree_pool.OctreePoolTest) ... ERROR test_forward_and_backward_max_pool (test_octree_pool.OctreePoolTest) ... ERROR test_forward_and_backward_max_unpool (test_octree_pool.OctreePoolTest) ... ERROR test_octree_property (test_octree_property.OctreePropertyTest) ... ERROR test_forward1 (test_octree_trilinear.OctreeTrilinearTest) ... ERROR test_points_property (test_points_property.PointsPropertyTest) ... ok
====================================================================== ERROR: test_forward_and_backward (test_octree_conv.OctreeConvTest)
Traceback (most recent call last): File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_conv.py", line 90, in test_forward_and_backward self.forward_and_backward(kernel_size[j], stride[i]) File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_conv.py", line 61, in forward_and_backward out3.backward(pesudo_grad2) File "/media/yangtian/SATA3/PyEnvs/ubuntu/ocnn-py37-env/lib/python3.7/site-packages/torch/_tensor.py", line 255, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/media/yangtian/SATA3/PyEnvs/ubuntu/ocnn-py37-env/lib/python3.7/site-packages/torch/autograd/init.py", line 149, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: merge_sort: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered
====================================================================== ERROR: test_forward_and_backward (test_octree_deconv.OctreeDeconvTest)
Traceback (most recent call last): File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_deconv.py", line 91, in test_forward_and_backward self.forward_and_backward(kernel_size[j], stride[i]) File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_deconv.py", line 31, in forward_and_backward octree = octree.cuda() RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
====================================================================== ERROR: test_decode_encode_key (test_octree_key.OctreeKeyTest)
Traceback (most recent call last): File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_key.py", line 10, in test_decode_encode_key octree = ocnn.octree_batch(samples).cuda() RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
====================================================================== ERROR: test_search_key (test_octree_key.OctreeKeyTest)
Traceback (most recent call last): File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_key.py", line 33, in test_search_key octree = ocnn.octree_batch(samples).cuda() RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
====================================================================== ERROR: test_xyz_key (test_octree_key.OctreeKeyTest)
Traceback (most recent call last): File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_key.py", line 25, in test_xyz_key octree = ocnn.octree_batch(samples).cuda() RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
====================================================================== ERROR: test_xyz_key_64 (test_octree_key.OctreeKeyTest)
Traceback (most recent call last): File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_key.py", line 47, in test_xyz_key_64 xyz = torch.cuda.ShortTensor([[2049, 4095, 8011, 1], [511, 4095, 8011, 0]]) RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
====================================================================== ERROR: test_forward_and_backward_avg_pool (test_octree_pool.OctreePoolTest)
Traceback (most recent call last): File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_pool.py", line 91, in test_forward_and_backward_avg_pool octree = octree.to('cuda') RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
====================================================================== ERROR: test_forward_and_backward_max_pool (test_octree_pool.OctreePoolTest)
Traceback (most recent call last): File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_pool.py", line 28, in test_forward_and_backward_max_pool octree = octree.to('cuda') RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
====================================================================== ERROR: test_forward_and_backward_max_unpool (test_octree_pool.OctreePoolTest)
Traceback (most recent call last): File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_pool.py", line 59, in test_forward_and_backward_max_unpool octree = octree.to('cuda') RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
====================================================================== ERROR: test_octree_property (test_octree_property.OctreePropertyTest)
Traceback (most recent call last): File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_property.py", line 60, in test_octree_property self.octree_property(on_cuda=True) File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_property.py", line 13, in octree_property octree = octree.cuda() RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
====================================================================== ERROR: test_forward1 (test_octree_trilinear.OctreeTrilinearTest)
Traceback (most recent call last): File "/media/yangtian/SATA3/Workspace/O-CNN-master/pytorch/test/test_octree_trilinear.py", line 11, in test_forward1 octree = ocnn.octree_batch(ocnn.octree_samples(['octree_1', 'octree_1'])).cuda() RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Ran 19 tests in 2.602s
FAILED (errors=11)
Thanks for your interest in our project.
This is a known issue and I also encountered it before. The error is indeed caused by the version of cublas, please try to build the code with CUDA 10.1 or 10.2. Currently, I have no bandwidth to fix this issue. If you are interested, please help to fix it.
For your reference, the code is tested with the following pytorch versions:
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
conda install pytorch==1.7.0 torchvision==0.8.0 cudatoolkit=10.2 -c pytorch
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch
conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=10.2 -c pytorch
conda install pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=10.2 -c pytorch
docker pull pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel
docker pull pytorch/pytorch:1.8.1-cuda10.2-cudnn7-devel
docker pull pytorch/pytorch:1.9.0-cuda10.2-cudnn7-devel
And the unit test failed with the following pytorch versions in my own experiments:
conda intall pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=11.1 -c pytorch
docker pull pytorch/pytorch:1.7.0-cuda11.0-cudnn8-devel
docker pull pytorch/pytorch:1.7.1-cuda11.0-cudnn8-devel
Appreciate your info :) I'll dig into it a little bit when free.
Hi author,
First thank you for sharing such a nice project here. I though the build was successful, but still met cublas error when running test after build octree with pytorch. I searched around but found no clues about this Cublas error in THGpuGemm, so asking for help here.
My enviroment is:
Ubuntu18, Python 3.7.13, Cuda 11.2, cudnn 8.1.1
torch 1.9.1+cu111 torchvision 0.10.1+cu111
Test log:
Some build log:
Actually I tried python3.8 with torch 1.10+cu113 before but met the same error with a little different which is:
So I downgrade python and pytorch but still not working. I'm wondering if it is the cuda version causing this error? Only a tiny version differences? Not sure..