Cuda `--Ofast-compile=max` fast compilation option erroring out

Without the compiler option: https://github.com/mrakgr/Spiral-s-ML-Library/blob/c5d8a529b210f84dc955a017aeff455c2d27affd/game/leduc/fast_compile.py With --Ofast-compile=max: https://github.com/mrakgr/Spiral-s-ML-Library/blob/3e0a35bf91b9cb687afcfa63700ed1aadd5856ef/game/leduc/fast_compile.py

I tried it out, but it is erroring out. Maybe without the optimizations it simply lacks the memory to run? But if so, it defeats the point of having this compiler option in the first place.

Here is how it looks like in the terminal for me.

PS C:\Spiral_s_ML_Library>  c:; cd 'c:\Spiral_s_ML_Library'; & 'c:\Users\mrakg\AppData\Local\pypoetry\Cache\virtualenvs\ui-EoO7T__V-py3.11\Scripts\python.exe' 'c:\Users\mrakg\.vscode\extensions\ms-python.debugpy-2024.10.0-win32-x64\bundled\libs\debugpy\adapter/../..\debugpy\launcher' '61274' '--' 'C:\Spiral_s_ML_Library\game\leduc\fast_compile.py' 
Going to run the Leduc full kernel.
DEBUG MODE. Threads per block, blocks per grid: 256, 24
Traceback (most recent call last):
  File "C:\Spiral_s_ML_Library\game\leduc\fast_compile.py", line 10166, in <module>
    if __name__ == '__main__': print(main())
                                     ^^^^^^
  File "C:\Spiral_s_ML_Library\game\leduc\fast_compile.py", line 10162, in main
    r = main_body()
        ^^^^^^^^^^^
  File "C:\Spiral_s_ML_Library\game\leduc\fast_compile.py", line 10117, in main_body
    v73 = v72.get()
          ^^^^^^^^^
  File "cupy\_core\core.pyx", line 1767, in cupy._core.core._ndarray_base.get
  File "cupy\_core\core.pyx", line 1854, in cupy._core.core._ndarray_base.get
  File "cupy\cuda\memory.pyx", line 586, in cupy.cuda.memory.MemoryPointer.copy_to_host_async
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
  File "cupy_backends\cuda\api\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorLaunchFailure: unspecified launch failure
Traceback (most recent call last):
  File "cupy_backends\cuda\api\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\cuda\api\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy_backends\cuda\api\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
  File "cupy_backends\cuda\api\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure

mrakgr / The-Spiral-Language

Cuda `--Ofast-compile=max` fast compilation option erroring out #34