mrakgr / The-Spiral-Language

Functional language with intensional polymorphism and first-class staging.
Mozilla Public License 2.0
920 stars 27 forks source link

Cuda `--Ofast-compile=max` fast compilation option erroring out #34

Open mrakgr opened 2 hours ago

mrakgr commented 2 hours ago

Without the compiler option: https://github.com/mrakgr/Spiral-s-ML-Library/blob/c5d8a529b210f84dc955a017aeff455c2d27affd/game/leduc/fast_compile.py With --Ofast-compile=max: https://github.com/mrakgr/Spiral-s-ML-Library/blob/3e0a35bf91b9cb687afcfa63700ed1aadd5856ef/game/leduc/fast_compile.py

I tried it out, but it is erroring out. Maybe without the optimizations it simply lacks the memory to run? But if so, it defeats the point of having this compiler option in the first place.

Here is how it looks like in the terminal for me.

PS C:\Spiral_s_ML_Library>  c:; cd 'c:\Spiral_s_ML_Library'; & 'c:\Users\mrakg\AppData\Local\pypoetry\Cache\virtualenvs\ui-EoO7T__V-py3.11\Scripts\python.exe' 'c:\Users\mrakg\.vscode\extensions\ms-python.debugpy-2024.10.0-win32-x64\bundled\libs\debugpy\adapter/../..\debugpy\launcher' '61274' '--' 'C:\Spiral_s_ML_Library\game\leduc\fast_compile.py' 
Going to run the Leduc full kernel.
DEBUG MODE. Threads per block, blocks per grid: 256, 24
Traceback (most recent call last):
  File "C:\Spiral_s_ML_Library\game\leduc\fast_compile.py", line 10166, in <module>
    if __name__ == '__main__': print(main())
                                     ^^^^^^
  File "C:\Spiral_s_ML_Library\game\leduc\fast_compile.py", line 10162, in main
    r = main_body()
        ^^^^^^^^^^^
  File "C:\Spiral_s_ML_Library\game\leduc\fast_compile.py", line 10117, in main_body
    v73 = v72.get()
          ^^^^^^^^^
  File "cupy\_core\core.pyx", line 1767, in cupy._core.core._ndarray_base.get
  File "cupy\_core\core.pyx", line 1854, in cupy._core.core._ndarray_base.get
  File "cupy\cuda\memory.pyx", line 586, in cupy.cuda.memory.MemoryPointer.copy_to_host_async
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorLaunchFailure: unspecified launch failure
Traceback (most recent call last):
  File "cupy_backends\cuda\api\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\cuda\api\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\cuda\api\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\cuda\api\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
mrakgr commented 2 hours ago

I meant to post the above as a comment, but am getting this error when I tried it. I'll try opening a new bug report.

image