Closed kvndhrty closed 8 months ago
I should note that swapping every SplineConv layer out for a GCNConv layer eliminates the error, so it somehow traces back to SplineConv but I'm not sure how.
Do you have a reproducible example for me?
I'll put together a barebones example today or tomorrow, hoping that running this on a Linux machine resolves my issue in the short term.
Fixed, I had some values in my edge_attr
values that were small negative numbers. It was hard to tell from the seg fault / illegal memory errors where it came from.
This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?
Closing this
Running a Torch 2.0.1 environment (tried with both CUDA 11.7 and 11.8) and PyTorch Lightning. A fairly simple model fails during backprop and throws:
RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
This same model also fails if used on the CPU but throws no error, just quietly exits at the first backprop stage.
I'm on a Windows x86 machine.