taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.38k stars 2.27k forks source link

CUDA out of memory using ndarray with device_memory_fraction #2920

Closed maajor closed 3 years ago

maajor commented 3 years ago

Describe the bug CUDA out of memory when declared device_memory_fraction using ndarray

To Reproduce

import taichi as ti

ti.init(ti.cuda, device_memory_fraction=0.8, debug=True)

arrs = []

for i in range(180):
    field = ti.lang.ndarray(dtype=ti.f32, shape=(2048, 2048))
    arrs.append(field)

Log/Screenshots

$ python my_sample_code.py
[Taichi] version 0.7.32, llvm 10.0.0, commit 6652f94f, win, python 3.8.10
[I 09/13/21 15:55:21.354 3464] [shell.py:_shell_pop_print@35] Graphical python shell detected, using wrapped sys.stdout
[Taichi] Starting on arch=cuda
Traceback (most recent call last):
  File "test.py", line 8, in <module>
    field = ti.lang.ndarray(dtype=ti.f32, shape=(2048, 2048))
  File "C:\Users\yidon\anaconda3\envs\taichi\lib\site-packages\taichi\lang\util.py", line 207, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\yidon\anaconda3\envs\taichi\lib\site-packages\taichi\lang\impl.py", line 617, in ndarray
    return ScalarNdarray(dtype, shape)
  File "C:\Users\yidon\anaconda3\envs\taichi\lib\site-packages\taichi\lang\ndarray.py", line 111, in __init__
    super().__init__(dtype, shape)
  File "C:\Users\yidon\anaconda3\envs\taichi\lib\site-packages\taichi\lang\ndarray.py", line 23, in __init__
    self.arr = torch.zeros(shape,
RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 24.00 GiB total capacity; 2.55 GiB already allocated; 2.99 MiB free; 2.55 GiB reserved in total by PyTorch)

Additional comments with ti diagnose.

[Taichi] version 0.7.32, llvm 10.0.0, commit 6652f94f, win, python 3.8.10
[I 09/13/21 15:58:49.187 19240] [shell.py:_shell_pop_print@35] Graphical python shell detected, using wrapped sys.stdout

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://taichi.rtfd.io/zh_CN/latest
GitHub: https://github.com/taichi-dev/taichi
Forum:  https://forum.taichi.graphics

Taichi system diagnose:

python: 3.8.10 (default, May 19 2021, 13:12:57) [MSC v.1916 64 bit (AMD64)]
system: win32
executable: c:\users\yidon\anaconda3\envs\taichi\python.exe
platform: Windows-10-10.0.19043-SP0
architecture: 64bit WindowsPE
uname: uname_result(system='Windows', node='DESKTOP-EMD6O0E', release='10', version='10.0.19043', machine='AMD64', processor='Intel64 Family 6 Model 167 Stepping 1, GenuineIntel')
locale: zh_CN.cp936
PATH: C:\Users\yidon\anaconda3\envs\taichi;C:\Users\yidon\anaconda3\envs\taichi\Library\mingw-w64\bin;C:\Users\yidon\anaconda3\envs\taichi\Library\usr\bin;C:\Users\yidon\anaconda3\envs\taichi\Library\bin;C:\Users\yidon\anaconda3\envs\taichi\Scripts;C:\Users\yidon\anaconda3\envs\taichi\bin;C:\Users\yidon\anaconda3\condabin;C:\Program Files (x86)\Common Files\Intel\Shared Libraries\redist\intel64_win\compiler;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\libnvvp;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;C:\Program Files\NVIDIA Corporation\Nsight Compute 2020.3.1;C:\Program Files\Docker\Docker\resources\bin;C:\ProgramData\DockerDesktop\version-bin;C:\ProgramData\chocolatey\bin;C:\Program Files\nodejs;C:\Program Files\PuTTY;C:\Program Files\RLM;C:\Program Files\3Delight\bin;C:\Program Files\ffmpeg\bin;C:\Users\yidon\AppData\Local\Programs\Python\Python39\Scripts;C:\Users\yidon\AppData\Local\Programs\Python\Python39;C:\Users\yidon\AppData\Local\Microsoft\WindowsApps;.;C:\Users\yidon\AppData\Local\Programs\Microsoft VS Code\bin;C:\Users\yidon\AppData\Roaming\npm;C:\Users\yidon\AppData\Local\Programs\Fiddler;C:\Users\yidon\anaconda3\envs\taichi\Lib\site-packages\taichi\core\../lib
PYTHONPATH: ['C:\\Users\\yidon\\anaconda3\\envs\\taichi\\Scripts\\ti.exe', 'c:\\users\\yidon\\anaconda3\\envs\\taichi\\python38.zip', 'c:\\users\\yidon\\anaconda3\\envs\\taichi\\DLLs', 'c:\\users\\yidon\\anaconda3\\envs\\taichi\\lib', 'c:\\users\\yidon\\anaconda3\\envs\\taichi', 'c:\\users\\yidon\\anaconda3\\envs\\taichi\\lib\\site-packages', 'C:\\Users\\yidon\\anaconda3\\envs\\taichi\\Lib\\site-packages\\taichi\\core\\../lib']

`lsb_release` not available: [WinError 2] 系统找不到指定的文件。

import: <module 'taichi' from 'c:\\users\\yidon\\anaconda3\\envs\\taichi\\lib\\site-packages\\taichi\\__init__.py'>

cc: False
cpu: True
metal: False
opengl: True
cuda: True

`glewinfo` not available: [WinError 2] 系统找不到指定的文件。

Mon Sep 13 15:59:00 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 461.92       Driver Version: 461.92       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 3090   WDDM  | 00000000:01:00.0  On |                  N/A |
| 44%   57C    P2   110W / 350W |   1612MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1612    C+G   Insufficient Permissions        N/A      |
|    0   N/A  N/A      1868    C+G   ...lack\app-4.19.3\slack.exe    N/A      |
|    0   N/A  N/A      5716    C+G   ...bBrowser\AcWebBrowser.exe    N/A      |
|    0   N/A  N/A      6436    C+G   ...b3d8bbwe\WinStore.App.exe    N/A      |
|    0   N/A  N/A      8216    C+G   ...sk\baidunetdiskrender.exe    N/A      |
|    0   N/A  N/A      8920    C+G   ...lPanel\SystemSettings.exe    N/A      |
|    0   N/A  N/A      9056    C+G   ...kyb3d8bbwe\Calculator.exe    N/A      |
|    0   N/A  N/A      9684    C+G   C:\Windows\explorer.exe         N/A      |
|    0   N/A  N/A     11788    C+G   ...TeamViewer\TeamViewer.exe    N/A      |
|    0   N/A  N/A     12132    C+G   ...8bbwe\Microsoft.Notes.exe    N/A      |
|    0   N/A  N/A     12988    C+G   ...5n1h2txyewy\SearchApp.exe    N/A      |
|    0   N/A  N/A     13536    C+G   ...5n1h2txyewy\SearchApp.exe    N/A      |
|    0   N/A  N/A     13692    C+G   ...ekyb3d8bbwe\YourPhone.exe    N/A      |
|    0   N/A  N/A     13700    C+G   ...s\Win64\EpicWebHelper.exe    N/A      |
|    0   N/A  N/A     14644    C+G   ...cw5n1h2txyewy\LockApp.exe    N/A      |
|    0   N/A  N/A     16144    C+G   ...nputApp\TextInputHost.exe    N/A      |
|    0   N/A  N/A     17768    C+G   ...n64\EpicGamesLauncher.exe    N/A      |
|    0   N/A  N/A     20024    C+G   ...cal\Feishu\app\Feishu.exe    N/A      |
|    0   N/A  N/A     20952    C+G   ...\app-3.4.5\SourceTree.exe    N/A      |
|    0   N/A  N/A     21424    C+G   ...8wekyb3d8bbwe\Cortana.exe    N/A      |
|    0   N/A  N/A     22092    C+G   ...bbwe\Microsoft.Photos.exe    N/A      |
|    0   N/A  N/A     22672    C+G   ...ge\Application\msedge.exe    N/A      |
|    0   N/A  N/A     22852    C+G   ...y\ShellExperienceHost.exe    N/A      |
|    0   N/A  N/A     25400    C+G   ...in7x64\steamwebhelper.exe    N/A      |
|    0   N/A  N/A     25668    C+G   ...wekyb3d8bbwe\Video.UI.exe    N/A      |
|    0   N/A  N/A     25944    C+G   ...icrosoft VS Code\Code.exe    N/A      |
+-----------------------------------------------------------------------------+

[Taichi] version 0.7.32, llvm 10.0.0, commit 6652f94f, win, python 3.8.10

[Taichi] version 0.7.32, llvm 10.0.0, commit 6652f94f, win, python 3.8.10
[Taichi] Starting on arch=x64

[Taichi] version 0.7.32, llvm 10.0.0, commit 6652f94f, win, python 3.8.10
[Taichi] Starting on arch=opengl

[Taichi] version 0.7.32, llvm 10.0.0, commit 6652f94f, win, python 3.8.10
[Taichi] Starting on arch=cuda

[Taichi] version 0.7.32, llvm 10.0.0, commit 6652f94f, win, python 3.8.10

*******************************************
**      Taichi Programming Language      **
*******************************************

Docs:   https://taichi.rtfd.io/zh_CN/latest
GitHub: https://github.com/taichi-dev/taichi
Forum:  https://forum.taichi.graphics

Running example minimal ...
[Taichi] Starting on arch=x64
>>> Running time: 0.34s
42

Consider attaching this log when maintainers ask about system information.
>>> Running time: 20.14s
k-ye commented 3 years ago

NDArray is backed by Torch tensor, so Taichi's device_memory_fraction cannot control it yet. (If anything, we should probably minimize the value of device_memory_fraction so that Pytorch can grab more GPU memory 😓 )

k-ye commented 3 years ago

FYI @strongoier @qiao-bo

maajor commented 3 years ago

Interesting 😅 Decrease device_memory_fraction does make the sample script works.

strongoier commented 3 years ago

NDArray is backed by Torch tensor, so Taichi's device_memory_fraction cannot control it yet. (If anything, we should probably minimize the value of device_memory_fraction so that Pytorch can grab more GPU memory sweat )

I think this is a reasonable explanation.