Closed leixm closed 1 year ago
When I look at /dev/shm there is no drop in occupancy
Filesystem Size Used Avail Use% Mounted on
overlay 745G 568G 177G 77% /
tmpfs 64M 0 64M 0% /dev
tmpfs 63G 0 63G 0% /sys/fs/cgroup
tmpfs 63G 17G 47G 27% /dev/shm
When I execute pmap command on SpillWorker process
pmap -x 5037 | grep plasma
00007f0dfcca0000 17485184 12296128 12296128 rw-s- plasmawdebze (deleted)
cc @ericl Can you take a look? Thank you so much.
One thing to watch out for is shared vs total memory. The shared memory object store's memory is reported as memory usage for all worker processes, so you have to subtract this out (e.g., RSS - SHR
) to get the actual memory overhead per worker.
If you do this subtraction, does the spill/restore worker memory still seem high?
cc @rickyyx When running large data sets, it seems very easy to OOM, do you have a better suggestion? The following is a task that takes up 2G of memory and cannot be successfully run on an 80G worker node.
OutOfMemoryError: Task was killed due to the node running low on memory.
Memory on the node (IP: xx.xx.xx.xx, ID: 9a332a2af973dc171e677d3a480c9fa07bbad59a95791185de9f22d4) where the task (task ID: 56530841d51ccb005252a8242ec172e3002da16002000000, name=map, pid=30391, memory used=1.92GB) was running was 77.51GB / 83.82GB (0.924687), which exceeds the memory usage threshold of 0.9. Ray killed this worker (ID: a957fc481f2478c25078bedeebf7fc40d6a0bd5c25efc55b6f5d8598) because it was the most recently scheduled task; to see more information about memory usage on this node, use `ray logs raylet.out -ip xx.xx.xx.xx`. To see the logs of the worker, use `ray logs worker-a957fc481f2478c25078bedeebf7fc40d6a0bd5c25efc55b6f5d8598*out -ip xx.xx.xx.xx. Top 10 memory users:
PID MEM(GB) COMMAND
4286 9.09 ray::IDLE_SpillWorker
488 5.48 /home/ray/anaconda3/lib/python3.7/site-packages/ray/core/src/ray/raylet/raylet --raylet_socket_name=...
11009 4.46 ray::IDLE_SpillWorker
10771 2.92 ray::IDLE_SpillWorker
30391 1.92 ray::map
9661 1.21 ray::IDLE_RestoreWorker
9660 1.21 ray::IDLE_RestoreWorker
4172 0.98 ray::IDLE_SpillWorker
59 0.21 /home/ray/anaconda3/lib/python3.7/site-packages/ray/core/src/ray/gcs/gcs_server --log_dir=/tmp/ray/s...
cc @rickyyx When running large data sets, it seems very easy to OOM, do you have a better suggestion? The following is a task that takes up 2G of memory and cannot be successfully run on an 80G worker node.
OutOfMemoryError: Task was killed due to the node running low on memory. Memory on the node (IP: xx.xx.xx.xx, ID: 9a332a2af973dc171e677d3a480c9fa07bbad59a95791185de9f22d4) where the task (task ID: 56530841d51ccb005252a8242ec172e3002da16002000000, name=map, pid=30391, memory used=1.92GB) was running was 77.51GB / 83.82GB (0.924687), which exceeds the memory usage threshold of 0.9. Ray killed this worker (ID: a957fc481f2478c25078bedeebf7fc40d6a0bd5c25efc55b6f5d8598) because it was the most recently scheduled task; to see more information about memory usage on this node, use `ray logs raylet.out -ip xx.xx.xx.xx`. To see the logs of the worker, use `ray logs worker-a957fc481f2478c25078bedeebf7fc40d6a0bd5c25efc55b6f5d8598*out -ip xx.xx.xx.xx. Top 10 memory users: PID MEM(GB) COMMAND 4286 9.09 ray::IDLE_SpillWorker 488 5.48 /home/ray/anaconda3/lib/python3.7/site-packages/ray/core/src/ray/raylet/raylet --raylet_socket_name=... 11009 4.46 ray::IDLE_SpillWorker 10771 2.92 ray::IDLE_SpillWorker 30391 1.92 ray::map 9661 1.21 ray::IDLE_RestoreWorker 9660 1.21 ray::IDLE_RestoreWorker 4172 0.98 ray::IDLE_SpillWorker 59 0.21 /home/ray/anaconda3/lib/python3.7/site-packages/ray/core/src/ray/gcs/gcs_server --log_dir=/tmp/ray/s...
I found that the task of the same data set is usually successful for the first time, but it is easy to fail the second time, or there will be many task oom killed.
One thing to watch out for is shared vs total memory. The shared memory object store's memory is reported as memory usage for all worker processes, so you have to subtract this out (e.g.,
RSS - SHR
) to get the actual memory overhead per worker.If you do this subtraction, does the spill/restore worker memory still seem high?
Thanks for your reply, I checked the spill worker and restore worker, when the job ends, the RSS-SHR is not a big value.
cc @rickyyx When running large data sets, it seems very easy to OOM, do you have a better suggestion? The following is a task that takes up 2G of memory and cannot be successfully run on an 80G worker node.
OutOfMemoryError: Task was killed due to the node running low on memory. Memory on the node (IP: xx.xx.xx.xx, ID: 9a332a2af973dc171e677d3a480c9fa07bbad59a95791185de9f22d4) where the task (task ID: 56530841d51ccb005252a8242ec172e3002da16002000000, name=map, pid=30391, memory used=1.92GB) was running was 77.51GB / 83.82GB (0.924687), which exceeds the memory usage threshold of 0.9. Ray killed this worker (ID: a957fc481f2478c25078bedeebf7fc40d6a0bd5c25efc55b6f5d8598) because it was the most recently scheduled task; to see more information about memory usage on this node, use `ray logs raylet.out -ip xx.xx.xx.xx`. To see the logs of the worker, use `ray logs worker-a957fc481f2478c25078bedeebf7fc40d6a0bd5c25efc55b6f5d8598*out -ip xx.xx.xx.xx. Top 10 memory users: PID MEM(GB) COMMAND 4286 9.09 ray::IDLE_SpillWorker 488 5.48 /home/ray/anaconda3/lib/python3.7/site-packages/ray/core/src/ray/raylet/raylet --raylet_socket_name=... 11009 4.46 ray::IDLE_SpillWorker 10771 2.92 ray::IDLE_SpillWorker 30391 1.92 ray::map 9661 1.21 ray::IDLE_RestoreWorker 9660 1.21 ray::IDLE_RestoreWorker 4172 0.98 ray::IDLE_SpillWorker 59 0.21 /home/ray/anaconda3/lib/python3.7/site-packages/ray/core/src/ray/gcs/gcs_server --log_dir=/tmp/ray/s...
I carefully observed the monitoring and worker process status on the worker node, and found that it is mainly caused by high cache memory usage, and ray OOMKiller will count cache memory together (when cgroup is enabled). Raylet, SpillWorker, and RestoreWorker also occupy part of the memory, but the proportion is not large, within the normal range. I think this is not a BUG, This issue can be closed, Thank you for your patience and kindness @ericl @rickyyx .
and ray OOMKiller will count cache memory together (when cgroup is enabled).
Is this memory counting confusing for you?
When running large data sets, it seems very easy to OOM, do you have a better suggestion? The following is a task that takes up 2G of memory and cannot be successfully run on an 80G worker node.
So the original OOM problem ^ happened because of excessive memory usage is the task?
What happened + What you expected to happen
I deployed a 5*4C20G ray cluster on k8s, and used the Ray Data API to do some map and groupby operations. When my job ends and the client closes the session, I find that the memory usage of SpillWorker and RestoreWorker is still high , the memory usage of a single SpillWorker process is as high as 5G+, and the SpillWorker/RestoreWorker process will not be killed. Is this a normal phenomenon? Does Ray support the kill operation for the io worker process of continuous IDLE? Memory usage
Versions / Dependencies
Ray Version: 2.2.0、2.3.0 Python: 3.7.16 OS: Ubuntu 20.04.5 LTS
Reproduction script
Issue Severity
High: It blocks me from completing my task.