Open patmjen opened 3 years ago
Weird, it works for me, using CUDA 11.1. Does running with CUDA_LAUNCH_BLOCKING=1
give you a more reasonable error message? Is it possible for you to determine which call in spspmm_cuda.cu
accesses illegal memory?
Unfortunately no, adding CUDA_LAUNCH_BLOCKING=1
does not change the error (except that it doesn't suggest using CUDA_LAUNCH_BLOCKING=1
now).
Is there a way I could determine what call accesses illegal memory without recompiling etc.? I suspect no, but no harm in asking.
What graphics card are you using? I once had to deal with a bug that only showed up on newer cards (despite using same CUDA version), since they had updated how some illegal operations were handled. On the old cards, the illegal operation was ignored (so I did not discover it), but not on the newer ones which caused the bug to pop up there. Maybe it's something similar here?
I think you have to re-compile to perform some further debugging. I have tested it on 1080Ti, 2080Ti and Titan RTX and they all work fine.
@JiaxuanYou, @RexYing: Can you also check if you can reproduce this issue?
I also just tested it on an NVIDIA GeForce RTX 2070 Super card on my Windows 10 machine. Here, the bug does not show up. So maybe it is dependent on the card.
Unfortunately, I don't have time to do further debugging in the near future. Sorry! I know this makes it hard to proceed, so if you want you can close the issue.
Thanks for reporting. I'm still leaving this issue open. If someone else has the same problem and is willing to debug, we can hopefully fix this.
Anybody still working on this? Ran into the same issue whilst deploying Graph-UNET, which relies on spspmm. Could perhaps try and debug.
It would be of much help if you can try to debug :)
I felt the same error. can anyone address this issue?
Does this mean that https://github.com/rusty1s/pytorch_sparse/issues/228 is resolved for you?
@thijssnelleman how did you solve the issue?
I believe I replaced the layer that made use of this function with another layer.. Not much of a solution but worked in my situation.
Summary
Running spspmm two times with the same inputs gives
RuntimeError: CUDA error: an illegal memory access was encountered
.The following snippet shows the issue for me:
When I run the above code, I get the following error:
Sorry if this just me using the library wrongly! Is there something I should be doing in between calls to spspmm? Or any other way to fix it?
Environment