I have converted a C++ codebase into a PyTorch extension, and it runs perfectly on GPUs with Compute Capability 8.6, specifically on the RTX 3090 and A4500. However, when testing on a Quadro RTX 6000 with Compute Capability 7.5, the FastSortFusedNew function hangs. The function either stalls upon first entry or hangs immediately.
Details:
PyTorch Version: 2.1.0
CUDA Version: 11.8
Operating System (working): Ubuntu 20.04
Operating System (failing): Ubuntu 18.04 or 22.04
I suspect the issue might not be related to the OS version since I encountered the same problem on both Ubuntu 18.04 and 22.04. The function runs without issues on the same codebase on GPUs with Compute Capability 8.6.
Has anyone experienced a similar issue, or does anyone have insights into why this might be happening?
I have converted a C++ codebase into a PyTorch extension, and it runs perfectly on GPUs with Compute Capability 8.6, specifically on the RTX 3090 and A4500. However, when testing on a Quadro RTX 6000 with Compute Capability 7.5, the FastSortFusedNew function hangs. The function either stalls upon first entry or hangs immediately.
Details:
PyTorch Version: 2.1.0 CUDA Version: 11.8 Operating System (working): Ubuntu 20.04 Operating System (failing): Ubuntu 18.04 or 22.04 I suspect the issue might not be related to the OS version since I encountered the same problem on both Ubuntu 18.04 and 22.04. The function runs without issues on the same codebase on GPUs with Compute Capability 8.6.
Has anyone experienced a similar issue, or does anyone have insights into why this might be happening?