ika-rwth-aachen / Point-Cloud-Compression

Implements a deep RNN based Point Cloud Compression approach for Velodyne Point Clouds. Reference implementation of corresponding IEEE IV22 paper.
MIT License
39 stars 4 forks source link

CUDA_ERROR_OUT_OF_MEMORY: out of memory #2

Closed Tran97 closed 10 months ago

Tran97 commented 11 months ago

Hello Till and Yuchen,

I am having issues running the Inference and Evaluation node. I appear to be running out of GPU memory. I have tried all the additive LSTM models, modified the number of iterations for the compression in the paramsfile and went as low as 1 without success. I have also tried to run it on 4 different computers with the following specs:

PC1

CPU: AMD Ryzen 5 3600 CPU, 6x 3.60GHz GPU: GIGABYTE GeForce RTX 2060 OC - 6GB GDDR6 RAM PATRIOT Viper 4 Blackout 32GB 2x16GB DDR4 3200MHz OS: Ubuntu 22.04 LTS

PC2

CPU: Intel® Core™ i7-8750H CPU @ 2.20GHz × 12 GPU: NVIDIA GeForce GTX 1050 --> 4096MB RAM: 12GB OS: Pop!_OS 22.04 LTS 64-bit

PC3

CPU: 8th Gen Intel® Core™ i7 GPU: NVIDIA® Quadro® P520 with 2GB GDDR5 RAM: 8 gb RAM OS: Ubuntu 22.04

PC4

CPU: Intel® Core™ i7-8565U CPU @ 1.80GHz × 8 GPU: NVIDIA GeForce GTX 1050 with Max-Q Design RAM: 16 gb RAM OS: Ubuntu 18.05.6 LTS

All of them were having trouble allocating enough memory. Here is the log from one of the devices, where the GPU flag has been set in the docker_eval.sh file. It is also shown that there are no other processes using the GPU prior to running the node:

=== ROS Docker Container =======================================================

Container setup:

Available GPUs: 1 name driver_version utilization.gpu [%] utilization.memory [%] memory.used [MiB] memory.total [MiB] NVIDIA GeForce GTX 1050 with Max-Q Design 470.182.03 0 % 0 % 428 MiB 4042 MiB

rosuser@steven-PS63:/catkin_ws$ roslaunch pointcloud_to_rangeimage compression.launch ... logging to /home/rosuser/.ros/log/2c4ea1e6-7ca4-11ee-9ed7-185680f44291/roslaunch-steven-PS63-1251.log Checking log directory for disk usage. This may take a while. Press Ctrl-C to interrupt Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://steven-PS63:40255/

SUMMARY

PARAMETERS

NODES / compression_decoder_node (pointcloud_to_rangeimage/compression_decoder.py) compression_encoder_node (pointcloud_to_rangeimage/compression_encoder.py) pointcloud_to_rangeimage_node (pointcloud_to_rangeimage/pointcloud_to_rangeimage_node) rangeimage_to_pointcloud_node (pointcloud_to_rangeimage/rangeimage_to_pointcloud_node) rqt_graph (rqt_graph/rqt_graph) rviz (rviz/rviz) snnrmse (snnrmse/snnrmse_node.py) velodyne_nodelet_manager_ping (nodelet/nodelet) velodyne_nodelet_manager_ping_laserscan (nodelet/nodelet) velodyne_nodelet_manager_ping_transform (nodelet/nodelet) velodyne_rosbag (rosbag/play)

auto-starting new master process[master]: started with pid [1267] ROS_MASTER_URI=http://steven-PS63:11311

setting /run_id to 2c4ea1e6-7ca4-11ee-9ed7-185680f44291 process[rosout-1]: started with pid [1285] started core service [/rosout] process[velodyne_nodelet_manager_ping-2]: started with pid [1292] process[velodyne_nodelet_manager_ping_transform-3]: started with pid [1293] process[velodyne_nodelet_manager_ping_laserscan-4]: started with pid [1294] process[velodyne_rosbag-5]: started with pid [1295] process[pointcloud_to_rangeimage_node-6]: started with pid [1300] process[compression_encoder_node-7]: started with pid [1307] process[compression_decoder_node-8]: started with pid [1313] [ INFO] [1699275506.192867617]: Initializing nodelet with 8 worker threads. process[rangeimage_to_pointcloud_node-9]: started with pid [1329] 0.000 0.000 2.304 2.304 4.608 4.608 6.912 6.912 9.216 9.216 11.520 11.520 13.824 13.824 16.128 16.128 18.432 18.432 20.736 20.736 23.040 23.040 25.344 25.344 27.648 27.648 29.952 29.952 32.256 32.256 34.560 34.560 55.296 55.296 57.600 57.600 59.904 59.904 62.208 62.208 64.512 64.512 66.816 66.816 69.120 69.120 71.424 71.424 73.728 73.728 76.032 76.032 78.336 78.336 80.640 80.640 82.944 82.944 85.248 85.248 87.552 87.552 89.856 89.856 110.592 110.592 112.896 112.896 115.200 115.200 117.504 117.504 119.808 119.808 122.112 122.112 124.416 124.416 126.720 126.720 129.024 129.024 131.328 131.328 133.632 133.632 135.936 135.936 138.240 138.240 140.544 140.544 142.848 142.848 145.152 145.152 165.888 165.888 168.192 168.192 170.496 170.496 172.800 172.800 175.104 175.104 177.408 177.408 179.712 179.712 182.016 182.016 184.320 184.320 186.624 186.624 188.928 188.928 191.232 191.232 193.536 193.536 195.840 195.840 198.144 198.144 200.448 200.448 221.184 221.184 223.488 223.488 225.792 225.792 228.096 228.096 230.400 230.400 232.704 232.704 235.008 235.008 237.312 237.312 239.616 239.616 241.920 241.920 244.224 244.224 246.528 246.528 248.832 248.832 251.136 251.136 253.440 253.440 255.744 255.744 276.480 276.480 278.784 278.784 281.088 281.088 283.392 283.392 285.696 285.696 288.000 288.000 290.304 290.304 292.608 292.608 294.912 294.912 297.216 297.216 299.520 299.520 301.824 301.824 304.128 304.128 306.432 306.432 308.736 308.736 311.040 311.040 331.776 331.776 334.080 334.080 336.384 336.384 338.688 338.688 340.992 340.992 343.296 343.296 345.600 345.600 347.904 347.904 350.208 350.208 352.512 352.512 354.816 354.816 357.120 357.120 359.424 359.424 361.728 361.728 364.032 364.032 366.336 366.336 387.072 387.072 389.376 389.376 391.680 391.680 393.984 393.984 396.288 396.288 398.592 398.592 400.896 400.896 403.200 403.200 405.504 405.504 407.808 407.808 410.112 410.112 412.416 412.416 414.720 414.720 417.024 417.024 419.328 419.328 421.632 421.632 442.368 442.368 444.672 444.672 446.976 446.976 449.280 449.280 451.584 451.584 453.888 453.888 456.192 456.192 458.496 458.496 460.800 460.800 463.104 463.104 465.408 465.408 467.712 467.712 470.016 470.016 472.320 472.320 474.624 474.624 476.928 476.928 497.664 497.664 499.968 499.968 502.272 502.272 504.576 504.576 506.880 506.880 509.184 509.184 511.488 511.488 513.792 513.792 516.096 516.096 518.400 518.400 520.704 520.704 523.008 523.008 525.312 525.312 527.616 527.616 529.920 529.920 532.224 532.224 552.960 552.960 555.264 555.264 557.568 557.568 559.872 559.872 562.176 562.176 564.480 564.480 566.784 566.784 569.088 569.088 571.392 571.392 573.696 573.696 576.000 576.000 578.304 578.304 580.608 580.608 582.912 582.912 585.216 585.216 587.520 587.520 608.256 608.256 610.560 610.560 612.864 612.864 615.168 615.168 617.472 617.472 619.776 619.776 622.080 622.080 624.384 624.384 626.688 626.688 628.992 628.992 631.296 631.296 633.600 633.600 635.904 635.904 638.208 638.208 640.512 640.512 642.816 642.816 [ INFO] [1699275506.251398722]: correction angles: /catkin_ws/src/velodyne/velodyne_pointcloud/params/VeloView-VLP-32C.yaml [ INFO] [1699275506.256420446]: laser_ring[ 0] = 0, angle = -0.436332 [ INFO] [1699275506.256480593]: laser_ring[ 3] = 1, angle = -0.272952 [ INFO] [1699275506.256530020]: laser_ring[ 4] = 2, angle = -0.197397 [ INFO] [1699275506.256553940]: laser_ring[ 7] = 3, angle = -0.154339 [ INFO] [1699275506.256574909]: laser_ring[ 8] = 4, angle = -0.126606 [ INFO] [1699275506.256595781]: laser_ring[11] = 5, angle = -0.107303 [ INFO] [1699275506.256717768]: laser_ring[12] = 6, angle = -0.093078 [ INFO] [1699275506.256956685]: laser_ring[16] = 7, angle = -0.081455 [ INFO] [1699275506.256982319]: laser_ring[15] = 8, angle = -0.069813 [ INFO] [1699275506.257004136]: laser_ring[19] = 9, angle = -0.064001 [ INFO] [1699275506.257056100]: laser_ring[20] = 10, angle = -0.058172 [ INFO] [1699275506.257194185]: laser_ring[24] = 11, angle = -0.052360 [ INFO] [1699275506.257373467]: laser_ring[23] = 12, angle = -0.046548 [ INFO] [1699275506.257490696]: laser_ring[27] = 13, angle = -0.040719 [ INFO] [1699275506.257514389]: laser_ring[28] = 14, angle = -0.034907 [ INFO] [1699275506.257539919]: laser_ring[ 2] = 15, angle = -0.029095 [ INFO] [1699275506.257565051]: laser_ring[31] = 16, angle = -0.023265 [ INFO] [1699275506.257741328]: laser_ring[ 1] = 17, angle = -0.017453 [ INFO] [1699275506.257839125]: laser_ring[ 6] = 18, angle = -0.011641 [ INFO] [1699275506.257942396]: laser_ring[10] = 19, angle = -0.005812 [ INFO] [1699275506.258018857]: laser_ring[ 5] = 20, angle = +0.000000 [ INFO] [1699275506.258060388]: laser_ring[ 9] = 21, angle = +0.005812 [ INFO] [1699275506.258104109]: laser_ring[14] = 22, angle = +0.011641 [ INFO] [1699275506.258213082]: laser_ring[18] = 23, angle = +0.017453 [ INFO] [1699275506.258349148]: laser_ring[13] = 24, angle = +0.023265 [ INFO] [1699275506.258422004]: laser_ring[17] = 25, angle = +0.029095 [ INFO] [1699275506.258508004]: laser_ring[22] = 26, angle = +0.040719 [ INFO] [1699275506.258591646]: laser_ring[21] = 27, angle = +0.058172 [ INFO] [1699275506.258656441]: laser_ring[26] = 28, angle = +0.081455 [ INFO] [1699275506.258815780]: laser_ring[25] = 29, angle = +0.122173 [ INFO] [1699275506.258847115]: laser_ring[30] = 30, angle = +0.180345 [ INFO] [1699275506.258876791]: laser_ring[29] = 31, angle = +0.261799 [ INFO] [1699275506.259234105]: Number of lasers: 32. [ WARN] [1699275506.260430662]: No Azimuth Cache configured for model 32C [ INFO] [1699275506.269196408]: Reconfigure request. [ INFO] [1699275506.269284344]: Target frame ID now: [ INFO] [1699275506.269327000]: Fixed frame ID now: [ INFO] [1699275506.269367209]: Using the organized cloud format... [ INFO] [1699275506.269472469]: Initialized container with min_range: 1, max_range: 200, target_frame: , fixed_frame: , init_width: 32, init_height: 0, is_dense: 0, scans_per_packet: 384 process[snnrmse-10]: started with pid [1331] process[rqt_graph-11]: started with pid [1339] process[rviz-12]: started with pid [1347] [ INFO] [1699275506.428038925]: RPM set to: 600 [ INFO] [1699275506.429268706]: Firing Cycle set to: 5.5296e-05 s [ INFO] [1699275506.435602490]: ang_res_x 0.199066 [ INFO] [1699275506.435674375]: min_range 0 [ INFO] [1699275506.435771609]: max_range 200 [ INFO] [1699275506.435845877]: Frame type : LASER QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-rosuser' [ INFO] [1699275506.493960209]: ang_res_x 0.199066 [ INFO] [1699275506.495018698]: min_range 0 [ INFO] [1699275506.495043182]: max_range 200 [ INFO] [1699275506.495059449]: Frame type : LASER [ INFO] [1699275506.496620363]: RPM set to: 600 [ INFO] [1699275506.496994191]: Firing Cycle set to: 5.5296e-05 s [ INFO] [1699275506.498992488]: Transport raw 2023-11-06 13:58:26.688475: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2023-11-06 13:58:26.715696: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-rosuser' 2023-11-06 13:58:28.391359: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1 2023-11-06 13:58:28.420668: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.420998: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:02:00.0 name: NVIDIA GeForce GTX 1050 with Max-Q Design computeCapability: 6.1 coreClock: 1.3285GHz coreCount: 5 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 104.43GiB/s 2023-11-06 13:58:28.421024: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2023-11-06 13:58:28.422541: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1 2023-11-06 13:58:28.424570: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11 2023-11-06 13:58:28.424711: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11 2023-11-06 13:58:28.425976: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10 2023-11-06 13:58:28.426299: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10 2023-11-06 13:58:28.427224: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11 2023-11-06 13:58:28.427992: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11 2023-11-06 13:58:28.428139: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8 2023-11-06 13:58:28.428242: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.428609: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.428653: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.429086: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2023-11-06 13:58:28.429201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:02:00.0 name: NVIDIA GeForce GTX 1050 with Max-Q Design computeCapability: 6.1 coreClock: 1.3285GHz coreCount: 5 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 104.43GiB/s 2023-11-06 13:58:28.429233: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2023-11-06 13:58:28.432634: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11 2023-11-06 13:58:28.432790: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11 2023-11-06 13:58:28.433846: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10 2023-11-06 13:58:28.434206: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10 2023-11-06 13:58:28.435214: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11 2023-11-06 13:58:28.436024: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11 2023-11-06 13:58:28.436251: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8 2023-11-06 13:58:28.436435: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.436880: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.437144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2023-11-06 13:58:28.438110: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-06 13:58:28.438569: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.438868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:02:00.0 name: NVIDIA GeForce GTX 1050 with Max-Q Design computeCapability: 6.1 coreClock: 1.3285GHz coreCount: 5 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 104.43GiB/s 2023-11-06 13:58:28.439011: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.439331: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.439589: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2023-11-06 13:58:28.439632: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2023-11-06 13:58:28.445694: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-11-06 13:58:28.446468: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.450616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:02:00.0 name: NVIDIA GeForce GTX 1050 with Max-Q Design computeCapability: 6.1 coreClock: 1.3285GHz coreCount: 5 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 104.43GiB/s 2023-11-06 13:58:28.450776: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.451283: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.459571: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2023-11-06 13:58:28.459622: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 2023-11-06 13:58:28.934979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2023-11-06 13:58:28.935007: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2023-11-06 13:58:28.935012: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2023-11-06 13:58:28.935260: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.935734: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.936031: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.936304: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2773 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1050 with Max-Q Design, pci bus id: 0000:02:00.0, compute capability: 6.1) 2023-11-06 13:58:28.943547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2023-11-06 13:58:28.943570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2023-11-06 13:58:28.943577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2023-11-06 13:58:28.943772: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.944109: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.944365: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-11-06 13:58:28.944593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2769 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1050 with Max-Q Design, pci bus id: 0000:02:00.0, compute capability: 6.1) 2023-11-06 13:58:28.962201: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11 2023-11-06 13:58:28.980762: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8 2023-11-06 13:58:29.211330: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11 2023-11-06 13:58:29.219372: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8 2023-11-06 13:58:29.257834: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8100 2023-11-06 13:58:29.494393: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8100 2023-11-06 13:58:29.545316: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11 2023-11-06 13:58:29.789418: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11 2023-11-06 13:58:30.082978: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 1.68G (1802960896 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.083418: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 1.51G (1622664704 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.083765: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 1.36G (1460398336 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.084113: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 1.22G (1314358528 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.084456: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 1.10G (1182922752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.084468: W tensorflow/core/common_runtime/bfc_allocator.cc:337] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 2023-11-06 13:58:30.121751: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.121781: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.10GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-06 13:58:30.122295: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.122308: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 324.53MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-06 13:58:30.124208: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.124688: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.194065: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 1.96G (2100756480 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.551434: W tensorflow/core/common_runtime/bfc_allocator.cc:337] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature. 2023-11-06 13:58:30.645262: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.46G (2637627392 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.645317: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.16GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-06 13:58:30.661875: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.46G (2637627392 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.661954: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 52.01MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-06 13:58:30.695497: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.46G (2637627392 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.695534: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 32.03MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-06 13:58:30.696525: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.46G (2637627392 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.696546: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 116.02MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-06 13:58:30.697482: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.46G (2637627392 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.697502: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 480.62MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-06 13:58:30.699822: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.46G (2637627392 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:30.700498: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.46G (2637627392 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:40.701104: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.46G (2637627392 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:40.701374: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.46G (2637627392 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:40.701388: W tensorflow/core/common_runtime/bfc_allocator.cc:456] Allocator (GPU_0_bfc) ran out of memory trying to allocate 36.00MiB (rounded to 37748736)requested by op Add If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. Current allocation summary follows. Current allocation summary follows. 2023-11-06 13:58:40.701397: I tensorflow/core/common_runtime/bfc_allocator.cc:991] BFCAllocator dump for GPU_0_bfc 2023-11-06 13:58:40.701402: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (256): Total Chunks: 15, Chunks in use: 14. 3.8KiB allocated for chunks. 3.5KiB in use in bin. 60B client-requested in use in bin. 2023-11-06 13:58:40.701406: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2023-11-06 13:58:40.701410: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1024): Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin. 2023-11-06 13:58:40.701414: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2048): Total Chunks: 2, Chunks in use: 1. 4.8KiB allocated for chunks. 2.2KiB in use in bin. 2.2KiB client-requested in use in bin. 2023-11-06 13:58:40.701419: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4096): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2023-11-06 13:58:40.701425: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2023-11-06 13:58:40.701430: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2023-11-06 13:58:40.701439: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (32768): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2023-11-06 13:58:40.701449: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (65536): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2023-11-06 13:58:40.701463: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (131072): Total Chunks: 4, Chunks in use: 3. 904.5KiB allocated for chunks. 684.0KiB in use in bin. 684.0KiB client-requested in use in bin. 2023-11-06 13:58:40.701477: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (262144): Total Chunks: 4, Chunks in use: 4. 1.78MiB allocated for chunks. 1.78MiB in use in bin. 1.78MiB client-requested in use in bin. 2023-11-06 13:58:40.701490: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (524288): Total Chunks: 1, Chunks in use: 1. 906.2KiB allocated for chunks. 906.2KiB in use in bin. 456.0KiB client-requested in use in bin. 2023-11-06 13:58:40.701504: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1048576): Total Chunks: 5, Chunks in use: 4. 8.91MiB allocated for chunks. 7.12MiB in use in bin. 7.12MiB client-requested in use in bin. 2023-11-06 13:58:40.701518: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2097152): Total Chunks: 9, Chunks in use: 7. 29.19MiB allocated for chunks. 21.88MiB in use in bin. 20.06MiB client-requested in use in bin. 2023-11-06 13:58:40.701533: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4194304): Total Chunks: 4, Chunks in use: 4. 23.69MiB allocated for chunks. 23.69MiB in use in bin. 21.38MiB client-requested in use in bin. 2023-11-06 13:58:40.701549: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8388608): Total Chunks: 4, Chunks in use: 1. 42.84MiB allocated for chunks. 9.00MiB in use in bin. 9.00MiB client-requested in use in bin. 2023-11-06 13:58:40.701559: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16777216): Total Chunks: 2, Chunks in use: 1. 35.81MiB allocated for chunks. 18.00MiB in use in bin. 18.00MiB client-requested in use in bin. 2023-11-06 13:58:40.701571: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (33554432): Total Chunks: 3, Chunks in use: 3. 110.00MiB allocated for chunks. 110.00MiB in use in bin. 108.00MiB client-requested in use in bin. 2023-11-06 13:58:40.701579: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (67108864): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2023-11-06 13:58:40.701589: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (134217728): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2023-11-06 13:58:40.701596: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (268435456): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2023-11-06 13:58:40.701604: I tensorflow/core/common_runtime/bfc_allocator.cc:1014] Bin for 36.00MiB was 32.00MiB, Chunk State: 2023-11-06 13:58:40.701609: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 134217728 2023-11-06 13:58:40.701617: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9200000000 of size 37748736 by op RandomUniform action_count 207 step 0 next 50 2023-11-06 13:58:40.701623: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9202400000 of size 18874368 by op Add action_count 141 step 0 next 49 2023-11-06 13:58:40.701629: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9203600000 of size 37748736 by op Mul action_count 209 step 0 next 52 2023-11-06 13:58:40.701635: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9205a00000 of size 39845888 by op Add action_count 167 step 0 next 18446744073709551615 2023-11-06 13:58:40.701639: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 67108864 2023-11-06 13:58:40.701645: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free at 7f9210000000 of size 18677760 by op UNUSED action_count 128 step 0 next 44 2023-11-06 13:58:40.701650: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f92111d0000 of size 5701632 by op Mul action_count 126 step 0 next 42 2023-11-06 13:58:40.701656: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9211740000 of size 9437184 by op Add action_count 82 step 0 next 39 2023-11-06 13:58:40.701661: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free at 7f9212040000 of size 3735552 by op UNUSED action_count 203 step 0 next 54 2023-11-06 13:58:40.701667: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f92123d0000 of size 1867776 by op AddV2 action_count 195 step 0 next 55 2023-11-06 13:58:40.701673: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free at 7f9212598000 of size 9338880 by op UNUSED action_count 0 step 0 next 43 2023-11-06 13:58:40.701678: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9212e80000 of size 3735552 by op AddV2 action_count 122 step 0 next 48 2023-11-06 13:58:40.701684: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free at 7f9213210000 of size 14614528 by op UNUSED action_count 202 step 0 next 18446744073709551615 2023-11-06 13:58:40.701689: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 33554432 2023-11-06 13:58:40.701694: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9258000000 of size 7471104 by op Fill action_count 18 step 0 next 19 2023-11-06 13:58:40.701700: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9258720000 of size 7471104 by op Fill action_count 19 step 0 next 20 2023-11-06 13:58:40.701706: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free at 7f9258e40000 of size 1867776 by op UNUSED action_count 206 step 0 next 53 2023-11-06 13:58:40.701711: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9259008000 of size 2850816 by op Mul action_count 199 step 0 next 36 2023-11-06 13:58:40.701717: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f92592c0000 of size 2359296 by op Add action_count 54 step 0 next 35 2023-11-06 13:58:40.701723: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free at 7f9259500000 of size 11534336 by op UNUSED action_count 201 step 0 next 18446744073709551615 2023-11-06 13:58:40.701728: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 8388608 2023-11-06 13:58:40.701733: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925a000000 of size 3735552 by op Fill action_count 7 step 0 next 8 2023-11-06 13:58:40.701738: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925a390000 of size 1867776 by op Fill action_count 8 step 0 next 9 2023-11-06 13:58:40.701744: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925a558000 of size 2785280 by op Fill action_count 9 step 0 next 18446744073709551615 2023-11-06 13:58:40.701749: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 16777216 2023-11-06 13:58:40.701754: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925a800000 of size 466944 by op Fill action_count 12 step 0 next 12 2023-11-06 13:58:40.701760: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925a872000 of size 466944 by op Fill action_count 13 step 0 next 13 2023-11-06 13:58:40.701766: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925a8e4000 of size 1867776 by op Fill action_count 14 step 0 next 14 2023-11-06 13:58:40.701771: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925aaac000 of size 1867776 by op Fill action_count 15 step 0 next 15 2023-11-06 13:58:40.701776: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925ac74000 of size 3735552 by op Fill action_count 16 step 0 next 16 2023-11-06 13:58:40.701782: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b004000 of size 3735552 by op Fill action_count 17 step 0 next 17 2023-11-06 13:58:40.701787: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b394000 of size 233472 by op ZerosLike action_count 20 step 0 next 21 2023-11-06 13:58:40.701793: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b3cd000 of size 256 by op Sub action_count 21 step 0 next 22 2023-11-06 13:58:40.701799: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b3cd100 of size 256 by op Sub action_count 49 step 0 next 32 2023-11-06 13:58:40.701804: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b3cd200 of size 256 by op Sub action_count 50 step 0 next 34 2023-11-06 13:58:40.701810: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b3cd300 of size 256 by op Sub action_count 77 step 0 next 37 2023-11-06 13:58:40.701815: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b3cd400 of size 256 by op Sub action_count 78 step 0 next 38 2023-11-06 13:58:40.701820: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b3cd500 of size 256 by op Sub action_count 136 step 0 next 47 2023-11-06 13:58:40.701826: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b3cd600 of size 256 by op Sub action_count 137 step 0 next 46 2023-11-06 13:58:40.701831: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b3cd700 of size 256 by op Sub action_count 162 step 0 next 29 2023-11-06 13:58:40.701836: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b3cd800 of size 256 by op Sub action_count 163 step 0 next 51 2023-11-06 13:58:40.701842: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free at 7f925b3cd900 of size 256 by op UNUSED action_count 210 step 0 next 26 2023-11-06 13:58:40.701847: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b3cda00 of size 256 by op Sub action_count 27 step 0 next 27 2023-11-06 13:58:40.701852: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b3cdb00 of size 256 by op Sub action_count 28 step 0 next 28 2023-11-06 13:58:40.701858: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free at 7f925b3cdc00 of size 2560 by op UNUSED action_count 31 step 0 next 30 2023-11-06 13:58:40.701863: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b3ce600 of size 2304 by op Add action_count 32 step 0 next 31 2023-11-06 13:58:40.701869: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free at 7f925b3cef00 of size 225792 by op UNUSED action_count 0 step 0 next 23 2023-11-06 13:58:40.701874: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b406100 of size 256 by op Mul action_count 23 step 0 next 24 2023-11-06 13:58:40.701879: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f925b406200 of size 233472 by op Mul action_count 24 step 0 next 25 2023-11-06 13:58:40.701887: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free at 7f925b43f200 of size 3935744 by op UNUSED action_count 204 step 0 next 18446744073709551615 2023-11-06 13:58:40.701897: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 2097152 2023-11-06 13:58:40.701906: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9291400000 of size 1280 by op ScratchBuffer action_count 1 step 0 next 1 2023-11-06 13:58:40.701912: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9291400500 of size 256 by op AssignVariableOp action_count 2 step 0 next 2 2023-11-06 13:58:40.701918: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9291400600 of size 466944 by op Cast action_count 3 step 0 next 3 2023-11-06 13:58:40.701923: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9291472600 of size 233472 by op Cast action_count 4 step 0 next 4 2023-11-06 13:58:40.701928: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f92914ab600 of size 256 by op Fill action_count 5 step 0 next 5 2023-11-06 13:58:40.701934: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f92914ab700 of size 466944 by op Fill action_count 10 step 0 next 10 2023-11-06 13:58:40.701939: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f929151d700 of size 928000 by op Fill action_count 11 step 0 next 18446744073709551615 2023-11-06 13:58:40.701944: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 4194304 2023-11-06 13:58:40.701950: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 7f9291600000 of size 4194304 by op Fill action_count 6 step 0 next 18446744073709551615 2023-11-06 13:58:40.701955: I tensorflow/core/common_runtime/bfc_allocator.cc:1051] Summary of in-use Chunks by size: 2023-11-06 13:58:40.701962: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 14 Chunks of size 256 totalling 3.5KiB 2023-11-06 13:58:40.701970: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1280 totalling 1.2KiB 2023-11-06 13:58:40.701973: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 2304 totalling 2.2KiB 2023-11-06 13:58:40.701976: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 233472 totalling 684.0KiB 2023-11-06 13:58:40.701979: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 4 Chunks of size 466944 totalling 1.78MiB 2023-11-06 13:58:40.701990: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 928000 totalling 906.2KiB 2023-11-06 13:58:40.702001: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 4 Chunks of size 1867776 totalling 7.12MiB 2023-11-06 13:58:40.702011: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 2359296 totalling 2.25MiB 2023-11-06 13:58:40.702015: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 2785280 totalling 2.66MiB 2023-11-06 13:58:40.702019: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 2850816 totalling 2.72MiB 2023-11-06 13:58:40.702022: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 4 Chunks of size 3735552 totalling 14.25MiB 2023-11-06 13:58:40.702025: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 4194304 totalling 4.00MiB 2023-11-06 13:58:40.702028: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 5701632 totalling 5.44MiB 2023-11-06 13:58:40.702031: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 7471104 totalling 14.25MiB 2023-11-06 13:58:40.702034: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 9437184 totalling 9.00MiB 2023-11-06 13:58:40.702037: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 18874368 totalling 18.00MiB 2023-11-06 13:58:40.702040: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 37748736 totalling 72.00MiB 2023-11-06 13:58:40.702043: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 39845888 totalling 38.00MiB 2023-11-06 13:58:40.702047: I tensorflow/core/common_runtime/bfc_allocator.cc:1058] Sum Total of in-use chunks: 193.03MiB 2023-11-06 13:58:40.702050: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] total_region_allocatedbytes: 266338304 memorylimit: 2903965696 available bytes: 2637627392 curr_region_allocationbytes: 4294967296 2023-11-06 13:58:40.702055: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: Limit: 2903965696 InUse: 202405120 MaxInUse: 1320845056 NumAllocs: 127 MaxAllocSize: 1168769024 Reserved: 0 PeakReserved: 0 LargestFreeBlock: 0

2023-11-06 13:58:40.702063: W tensorflow/core/common_runtime/bfc_allocator.cc:467] **x__***__********** 2023-11-06 13:58:40.707029: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 2.46G (2637627392 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory 2023-11-06 13:58:40.707048: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 52.01MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2023-11-06 13:58:40.707445: E tensorflow/stream_executor/cuda/cuda_driver.cc:1067] failed to synchronize the stop event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered 2023-11-06 13:58:40.707457: E tensorflow/stream_executor/gpu/gpu_timer.cc:55] Internal: Error destroying CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered 2023-11-06 13:58:40.707462: E tensorflow/stream_executor/gpu/gpu_timer.cc:60] Internal: Error destroying CUDA event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered 2023-11-06 13:58:40.707477: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 8B (8 bytes) from device: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered 2023-11-06 13:58:40.707491: E tensorflow/stream_executor/stream.cc:5020] Internal: Failed to enqueue async memset operation: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered 2023-11-06 13:58:40.707543: W tensorflow/core/kernels/gpu_utils.cc:69] Failed to check cudnn convolutions for out-of-bounds reads and writes with an error message: 'Failed to load in-memory CUBIN: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered'; skipping this check. This only means that we won't check cudnn for out-of-bounds reads and writes. This message will only be printed once. 2023-11-06 13:58:40.707556: I tensorflow/stream_executor/cuda/cuda_driver.cc:789] failed to allocate 8B (8 bytes) from device: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered 2023-11-06 13:58:40.707567: E tensorflow/stream_executor/stream.cc:5020] Internal: Failed to enqueue async memset operation: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered 2023-11-06 13:58:40.707600: F tensorflow/stream_executor/cuda/cuda_dnn.cc:214] Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0)Failed to set cuDNN stream. [compression_encoder_node-7] process has died [pid 1307, exit code -6, cmd /catkin_ws/devel/lib/pointcloud_to_rangeimage/compression_encoder.py name:=compression_encoder_node log:=/home/rosuser/.ros/log/2c4ea1e6-7ca4-11ee-9ed7-185680f44291/compression_encoder_node-7.log]. log file: /home/rosuser/.ros/log/2c4ea1e6-7ca4-11ee-9ed7-185680f44291/compression_encoder_node-7.log Traceback (most recent call last): File "/catkin_ws/devel/lib/pointcloud_to_rangeimage/compression_decoder.py", line 15, in exec(compile(fh.read(), python_script, 'exec'), context) File "/catkin_ws/src/pointcloud_to_rangeimage/scripts/compression_decoder.py", line 38, in main() File "/catkin_ws/src/pointcloud_to_rangeimage/scripts/compression_decoder.py", line 21, in main decoder = additive_lstm.MsgDecoder() File "/catkin_ws/src/pointcloud_to_rangeimage/src/architectures/additive_lstm.py", line 394, in init self.decoder.load_weights(weights_path, by_name=True) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 2323, in load_weights hdf5_format.load_weights_from_hdf5_group_by_name( File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 790, in load_weights_from_hdf5_group_by_name raise ValueError('Layer #' + str(k) +' (named "' + layer.name + ValueError: Layer #1 (named "decoder"), weight <tf.Variable 'decoder_model/decoder/d_conv1/kernel:0' shape=(1, 1, 32, 512) dtype=float32, numpy= array([[[[ 0.08115087, 0.06084418, 0.06869627, ..., -0.0592371 , 0.0433974 , -0.00223392], [-0.01425195, 0.07770648, 0.06064992, ..., -0.08394855, -0.05900468, 0.00295687], [-0.00558364, -0.06454663, 0.00857301, ..., 0.02866896, 0.08478307, 0.0413385 ], ..., [ 0.02279697, 0.06459316, -0.08035986, ..., 0.10030504, 0.04260159, -0.07629389], [-0.01725329, 0.06205062, 0.08351622, ..., 0.02603789, -0.03077619, 0.09059689], [-0.09484547, 0.02096093, -0.09580201, ..., -0.02254062, 0.03195103, -0.03735457]]]], dtype=float32)> has shape (1, 1, 32, 512), but the saved weight has shape (1, 1, 32, 128). [compression_decoder_node-8] process has died [pid 1313, exit code 1, cmd /catkin_ws/devel/lib/pointcloud_to_rangeimage/compression_decoder.py name:=compression_decoder_node log:=/home/rosuser/.ros/log/2c4ea1e6-7ca4-11ee-9ed7-185680f44291/compression_decoder_node-8.log]. log file: /home/rosuser/.ros/log/2c4ea1e6-7ca4-11ee-9ed7-185680f44291/compression_decoder_node-8.log

TillBeemelmanns commented 11 months ago

Hi Tran97,

I just successfully rerun the project on a RTX2080 with 8 GB RAM. The model additive_lstm_demo needs about 2822MB for the encoder and 2438MB for the decoder.

$ nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 18%   56C    P2    51W / 215W |   5894MiB /  8192MiB |      5%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2063      G   /usr/lib/xorg/Xorg                363MiB |
|    0   N/A  N/A      2225      G   /usr/bin/gnome-shell               44MiB |
|    0   N/A  N/A      2507      G   ...mviewer/tv_bin/TeamViewer        2MiB |
|    0   N/A  N/A      6362      G   ...RendererForSitePerProcess       69MiB |
|    0   N/A  N/A     12949    C+G   ...806019135037578737,262144      140MiB |
|    0   N/A  N/A     21262      C   /usr/bin/python3                 2822MiB |
|    0   N/A  N/A     21271      C   /usr/bin/python3                 2438MiB |
|    0   N/A  N/A     21299      G   .../ros/noetic/lib/rviz/rviz        6MiB |
+-----------------------------------------------------------------------------+

However, I think it is possible to reduce the necessary RAM with some efforts. Try to run the model in Mixed Precision. You can set the parameter mixed_precision in this file. I know that there is currently a bug in the model that prevents the usage of mixed precision, but I think it is generally possible with some changes in the model. Mixed Precision should halve the necessary RAM as FP16 are being used instead of FP32, but might also have some effect on the precision on the model.

Another option would be to optimize the model with TF-TensorRT. You would need to export the model as a tftrt model and also it would be necessary to rewrite the Python ROS nodes to load this optimized model.

A third option would be to run the model on CPU, which would slow down the inference step.

Let me know if these information help you.

Best regards, Till

TillBeemelmanns commented 11 months ago

I just tried to run everything with Tensorflow 2.11 and surprisingly the new version needs less RAM.

You can try to run the ROS environment with the following new image

tillbeemelmanns/pointcloud_compression:tf2.11

And you could use the following script to start the environment

#!/bin/bash

# in order to be able to use this script install:
# pip install docker-run-cli
DIR="$(cd -P "$(dirname "$0")" && pwd)"
if docker ps --format '{{.Names}}' | grep -q "pcl"; then
    docker-run --name pcl
else
    docker-run --volume $(dirname "$DIR")/catkin_ws:/catkin_ws --image tillbeemelmanns/pointcloud_compression:tf2.11 --workdir="/catkin_ws" --name pcl
fi
Wed Nov  8 13:41:09 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| 18%   46C    P2    51W / 215W |   3897MiB /  8192MiB |     27%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2063      G   /usr/lib/xorg/Xorg                355MiB |
|    0   N/A  N/A      2225      G   /usr/bin/gnome-shell               47MiB |
|    0   N/A  N/A      6362      G   ...RendererForSitePerProcess       89MiB |
|    0   N/A  N/A     12949    C+G   ...806019135037578737,262144      140MiB |
|    0   N/A  N/A     51157      C   /usr/bin/python3                 1754MiB |
|    0   N/A  N/A     51163      C   /usr/bin/python3                 1498MiB |
|    0   N/A  N/A     51190      G   .../ros/noetic/lib/rviz/rviz        6MiB |
+-----------------------------------------------------------------------------+

Maybe this could be helpful too