real-stanford / cloth-funnels

[ICRA 2023] This repository contains code for training and evaluating Cloth Funnels in simulation for Ubuntu 18.04.
29 stars 4 forks source link

memory leak #9

Open zcswdt opened 10 months ago

zcswdt commented 10 months ago

Can you re-upload the code? Thank you very much

Entongsu commented 7 months ago

I fix it by removing this line https://github.com/real-stanford/cloth-funnels/blob/1fb231e0633b0603eb940aec130ab903e41c2d03/cloth_funnels/PyFlex/bindings/opengl/shader.cpp#L81

zcswdt commented 7 months ago

I fix it by removing this line

https://github.com/real-stanford/cloth-funnels/blob/1fb231e0633b0603eb940aec130ab903e41c2d03/cloth_funnels/PyFlex/bindings/opengl/shader.cpp#L81

It's very powerful. I tried for a long time but couldn't solve it. You deleted this place. Have you verified that the code function is OK?

zcswdt commented 7 months ago

我通过删除这一行来修复它

https://github.com/real-stanford/cloth-funnels/blob/1fb231e0633b0603eb940aec130ab903e41c2d03/cloth_funnels/PyFlex/bindings/opengl/shader.cpp#L81

I fix it by removing this line

https://github.com/real-stanford/cloth-funnels/blob/1fb231e0633b0603eb940aec130ab903e41c2d03/cloth_funnels/PyFlex/bindings/opengl/shader.cpp#L81 I just tried, but there is still a memory leak. Have you tried?

ray.exceptions.OutOfMemoryError: Task was killed due to the node running low on memory. Memory on the node (IP: 100.79.61.171, ID: dd64af687cb1299d0339f770b1cf8002a43d25240e4f9a1aea3abe31) where the task (actor ID: 198f27dd8eedd8a529db2fdc01000000, name=SimEnv.init, pid=25063, memory used=2.28GB) was running was 29.74GB / 31.30GB (0.950172), which exceeds the memory usage threshold of 0.95. Ray killed this worker (ID: cfe2287dcee7a2bc8a0bef249ec702f3351b919a353a178bc04fa50b) because it was the most recently scheduled task; to see more information about memory usage on this node, use ray logs raylet.out -ip 100.79.61.171. To see the logs of the worker, use `ray logs worker-cfe2287dcee7a2bc8a0bef249ec702f3351b919a353a178bc04fa50b*out -ip 100.79.61.171. Top 10 memory users: PID MEM(GB) COMMAND

Entongsu commented 7 months ago

My code functions well now, but there are many OpenGL warnings, and it can be run directly. I modified the code and did not use any Ray-related function for the running.

zcswdt commented 7 months ago

My code functions well now, but there are many OpenGL warnings, and it can be run directly. I modified the code and did not use any Ray-related function for the running.

Can you explain why this place was deleted? Can you also tell me your computer configuration? Run, nvidia-smi, nvcc -V, and free -h with your ubuntu version. Thanks.

Entongsu commented 7 months ago

I have fixed the warning now(I made some mistakes by myself), and the code can be runned only by removing this line. The reason I removed this space is that I got the error of assertion error on this line.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:2B:00.0  On |                  N/A |
|  0%   51C    P8    38W / 420W |   1481MiB / 24576MiB |     54%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
              total        used        free      shared  buff/cache   available
Mem:           62Gi        10Gi        15Gi       1.0Gi        36Gi        50Gi
Swap:          47Mi        47Mi       0.0Ki
zcswdt commented 7 months ago

I have fixed the warning now(I made some mistakes by myself), and the code can be runned only by removing this line. The reason I removed this space is that I got the error of assertion error on this line.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:2B:00.0  On |                  N/A |
|  0%   51C    P8    38W / 420W |   1481MiB / 24576MiB |     54%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
              total        used        free      shared  buff/cache   available
Mem:           62Gi        10Gi        15Gi       1.0Gi        36Gi        50Gi
Swap:          47Mi        47Mi       0.0Ki

Thank you very much for providing this to me. Can you explain why deleting this line of code can solve it?

Entongsu commented 7 months ago

Because I got the assertion error from this line. I tried to remove this and found the code functions well.

zcswdt commented 7 months ago

Because I got the assertion error from this line. I tried to remove this and found the code functions well.

Got it, try training for half an hour and see if the memory leaks. Maybe the driver and CUDA you and I use are different. My cuda is 10.0