I tried it out, but it is erroring out. Maybe without the optimizations it simply lacks the memory to run? But if so, it defeats the point of having this compiler option in the first place.
Here is how it looks like in the terminal for me.
PS C:\Spiral_s_ML_Library> c:; cd 'c:\Spiral_s_ML_Library'; & 'c:\Users\mrakg\AppData\Local\pypoetry\Cache\virtualenvs\ui-EoO7T__V-py3.11\Scripts\python.exe' 'c:\Users\mrakg\.vscode\extensions\ms-python.debugpy-2024.10.0-win32-x64\bundled\libs\debugpy\adapter/../..\debugpy\launcher' '61274' '--' 'C:\Spiral_s_ML_Library\game\leduc\fast_compile.py'
Going to run the Leduc full kernel.
DEBUG MODE. Threads per block, blocks per grid: 256, 24
Traceback (most recent call last):
File "C:\Spiral_s_ML_Library\game\leduc\fast_compile.py", line 10166, in <module>
if __name__ == '__main__': print(main())
^^^^^^
File "C:\Spiral_s_ML_Library\game\leduc\fast_compile.py", line 10162, in main
r = main_body()
^^^^^^^^^^^
File "C:\Spiral_s_ML_Library\game\leduc\fast_compile.py", line 10117, in main_body
v73 = v72.get()
^^^^^^^^^
File "cupy\_core\core.pyx", line 1767, in cupy._core.core._ndarray_base.get
File "cupy\_core\core.pyx", line 1854, in cupy._core.core._ndarray_base.get
File "cupy\cuda\memory.pyx", line 586, in cupy.cuda.memory.MemoryPointer.copy_to_host_async
File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
File "cupy_backends\cuda\api\runtime.pyx", line 606, in cupy_backends.cuda.api.runtime.memcpyAsync
File "cupy_backends\cuda\api\runtime.pyx", line 146, in cupy_backends.cuda.api.runtime.check_status
cupy_backends.cuda.api.runtime.CUDARuntimeError: cudaErrorLaunchFailure: unspecified launch failure
Traceback (most recent call last):
File "cupy_backends\cuda\api\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
File "cupy_backends\cuda\api\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
File "cupy_backends\cuda\api\driver.pyx", line 234, in cupy_backends.cuda.api.driver.moduleUnload
File "cupy_backends\cuda\api\driver.pyx", line 63, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_LAUNCH_FAILED: unspecified launch failure
Without the compiler option: https://github.com/mrakgr/Spiral-s-ML-Library/blob/c5d8a529b210f84dc955a017aeff455c2d27affd/game/leduc/fast_compile.py With --Ofast-compile=max: https://github.com/mrakgr/Spiral-s-ML-Library/blob/3e0a35bf91b9cb687afcfa63700ed1aadd5856ef/game/leduc/fast_compile.py
I tried it out, but it is erroring out. Maybe without the optimizations it simply lacks the memory to run? But if so, it defeats the point of having this compiler option in the first place.
Here is how it looks like in the terminal for me.