heavyai / heavydb

HeavyDB (formerly OmniSciDB)
https://heavy.ai
Apache License 2.0
2.92k stars 444 forks source link

BufferMgr.cpp:416 ALLOCATION failed to find 2452160512B throwing out of memory GPU_MGR:0 #733

Open jieguolove opened 2 years ago

jieguolove commented 2 years ago

image image

heavysql> \memory_summary HeavyDB Server CPU Memory Summary: MAX USE ALLOCATED FREE 98304.00 MB 922.65 MB 4096.00 MB 3173.35 MB

HeavyDB Server GPU Memory Summary: [GPU] MAX USE ALLOCATED FREE [0] 32768.00 MB 922.65 MB 3072.00 MB 2149.35 MB

heavysql> \q User admin disconnected from database heavyai omnisky@omnisky-Super-Server:/var/lib/heavyai$ sudo systemctl status heavydb [sudo] password for omnisky: Sorry, try again. [sudo] password for omnisky: ● heavydb.service - HEAVY.AI HeavyDB database server Loaded: loaded (/lib/systemd/system/heavydb.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2022-04-21 21:59:26 EDT; 20min ago Main PID: 18410 (heavydb) Tasks: 114 (limit: 4915) CGroup: /system.slice/heavydb.service ├─18410 /opt/heavyai/bin/heavydb --config /var/lib/heavyai/heavy.conf └─18426 -Xmx1024m -DLOG_DIR=/var/lib/heavyai/log/ -jar /opt/heavyai-installs/heavyai-ee-6.0.0-20220418-d4d1c2a42c-Linux-x86_64-render/bin/calcite-1.0-SNAPSHOT-jar-with-dependencies

Apr 21 22:04:51 omnisky-Super-Server heavydb[18410]: 2022-04-21T22:04:51.494456 E 18410 0 7 BufferMgr.cpp:416 ALLOCATION failed to find 2452160512B throwing out of memory GPU_MGR:0 Apr 21 22:04:51 omnisky-Super-Server heavydb[18410]: 2022-04-21T22:04:51.494958 E 18410 0 7 RelAlgExecutor.cpp:3866 Query execution failed with error ERR_OUT_OF_GPU_MEM: Query couldn't keep t Apr 21 22:04:58 omnisky-Super-Server heavydb[18410]: 2022-04-21T22:04:58.523521 E 18410 0 5 BufferMgr.cpp:416 ALLOCATION failed to find 2452160512B throwing out of memory GPU_MGR:0 Apr 21 22:04:58 omnisky-Super-Server heavydb[18410]: 2022-04-21T22:04:58.523990 E 18410 0 5 RelAlgExecutor.cpp:3866 Query execution failed with error ERR_OUT_OF_GPU_MEM: Query couldn't keep t Apr 21 22:12:52 omnisky-Super-Server heavydb[18410]: 2022-04-21T22:12:52.619967 E 18410 0 10 GlobalRenderContext.cpp:435 GPU 0 has incomplete ID PBO Pool Apr 21 22:12:52 omnisky-Super-Server heavydb[18410]: 2022-04-21T22:12:52.620136 E 18410 0 10 GlobalRenderContext.cpp:113 Render resources and caches incomplete, unable to render Apr 21 22:12:52 omnisky-Super-Server heavydb[18410]: 2022-04-21T22:12:52.631988 E 18410 0 10 DBHandler.cpp:4673 OutOfGpuMemoryError: Error allocating device memory: Vulkan Error: VK_ERROR_OUT Apr 21 22:14:50 omnisky-Super-Server heavydb[18410]: 2022-04-21T22:14:50.911405 E 18410 0 10 GlobalRenderContext.cpp:435 GPU 0 has incomplete ID PBO Pool Apr 21 22:14:50 omnisky-Super-Server heavydb[18410]: 2022-04-21T22:14:50.911501 E 18410 0 10 GlobalRenderContext.cpp:113 Render resources and caches incomplete, unable to render Apr 21 22:14:50 omnisky-Super-Server heavydb[18410]: 2022-04-21T22:14:50.924404 E 18410 0 10 DBHandler.cpp:4673 OutOfGpuMemoryError: Error allocating device memory: Vulkan Error: VK_ERROR_OUT omnisky@omnisky-Super-Server:/var/lib/heavyai$ nvidia-smi Thu Apr 21 22:21:55 2022
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:03:00.0 On | N/A | | 27% 38C P8 26W / 250W | 10558MiB / 11264MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 2508 G /usr/lib/xorg/Xorg 206MiB | | 0 N/A N/A 2958 G /usr/bin/gnome-shell 112MiB | | 0 N/A N/A 4071 G /usr/lib/firefox/firefox 144MiB | | 0 N/A N/A 11465 G ...nlogin/bin/sunloginclient 7MiB | | 0 N/A N/A 14992 C+G ...g/create-2022.1.1/kit/kit 6817MiB | | 0 N/A N/A 18410 C+G /opt/heavyai/bin/heavydb 3259MiB | +-----------------------------------------------------------------------------+ omnisky@omnisky-Super-Server:/var/lib/heavyai$ cat heavy.conf port = 6274 http-port = 6278 calcite-port = 6279 data = "/var/lib/heavyai" null-div-by-zero = true allowed-import-paths = ["/heavyai-data", "/home/omnisky", "/var/lib/heavyai", "/opt/heavyai"] max-cacheable-hashtable-size-bytes = 10737418240 hashtable-cache-total-bytes = 21474836480

allowed-import-paths = "/"

[web] port = 6273 frontend = "/opt/heavyai/frontend"

To Solved those errors, What gpu memory parameters should be set? thanks!

cdessanti commented 2 years ago

Hi,

You are getting these errors because of the NVIDIA Omniverse Create 2022.1 14992 C+G ...g/create-2022.1.1/kit/kit 6817MiB that's eating more than half of your GPU memory, leaving the HeavyDB server with less than 4GB.

The best thing you can do is stop the NVIDIA Omniverse, then restart the server; this way, you will have around 10GB for the database cache and rendering.

But if you want to run heavydb with such low memory with rendering enabled, you have to limit the memory allocated by the BufferManager for GPU. The parameter is gpu-buffer-mem-bytes

you can set it to 2.5GB so the renderer will be able to allocate 1GB gpu-buffer-mem-bytes=2684354560

Of course, this will limit the number of records you will be able to process using the GPU; anyway, the query would fall to the CPU Engine, and the GPU would render the result but with a big performance hit.

How many records/columns are you planning to use, and how many point/polygons are you going to render?

Regards, Candido