Closed achordia20 closed 10 months ago
I'm seeing this on VM's as well instead of just a Kubernetes cluster.
@achordia20 we are using /proc/pid/smaps_rollup to determine the memory usage of a process during OOM killing, specifically we only look into Private_Clean/Dirty/Hugetlb section, which seems not account for the page cache usage (managed by kernel for speeding up disk data access).
Do you have a script for us to reproduce this behavior? Also if you can show the context of /proc/pid/smaps_rollup if you got a repro that would be nice.
another option is that you can turn off the OOM killing feature, which can be controlled by RAY_memory_usage_threshold
https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html#how-do-i-configure-the-memory-monitor
Unfortunately, I also encountered this issue with version 2.4.0, but my cache does not seem to be particularly large. It seems that the memory has been increasing since we started submitting jobs to the ray-cluster(kuberay), until it reaches around 50%. Even if no jobs are running, the memory usage is still high.
sys/fs/cgroup/memory# cat memory.stat
cache 7702360064
rss 4729700352
rss_huge 2789212160
shmem 6167658496
mapped_file 6168662016
dirty 8192
writeback 0
swap 0
pgpgin 178125506
pgpgout 177528593
pgfault 195961384
pgmajfault 954
inactive_anon 5990920192
active_anon 4934561792
inactive_file 1026723840
active_file 507977728
unevictable 0
hierarchical_memory_limit 24000000000
hierarchical_memsw_limit 24000000000
total_cache 7702360064
total_rss 4729700352
total_rss_huge 2789212160
total_shmem 6167658496
total_mapped_file 6168662016
total_dirty 8192
total_writeback 0
total_swap 0
total_pgpgin 178125506
total_pgpgout 177528593
total_pgfault 195961384
total_pgmajfault 954
total_inactive_anon 5990920192
total_active_anon 4934561792
total_inactive_file 1026723840
total_active_file 507977728
total_unevictable 0
I have filed a RP for this https://github.com/ray-project/ray/pull/42508 (But only for CGroup V1)
@achordia20 @yvmilir Would you help test against my fix PR https://github.com/ray-project/ray/pull/42508 ?
You can install my custom built package:
pip install "ray[default] @ https://github.com/WeichenXu123/packages/raw/c5d6cedacec0ec2446a8c0803b14f35937b5fe0e/ray/spark-df-loader/ray-3.0.0.dev0-cp310-cp310-linux_x86_64.whl"
What happened + What you expected to happen
I started seeing an issue on our Ray cluster where I'd see nodes with no activity still displaying high memory usage. When I looked into the k8s pod, I didn't see any tasks running or internal ray processes using significant memory. As I dug more, I found that most of the memory was being assigned from disk cache due to our jobs doing significant disk I/O.
I've attached screenshots of what I saw.
I was also able to check this through the ray apis.
Node memory stats did show a high amount of cache memory but since I'm running in k8s, it's harder to prove that it belongs to the ray pod. I was able to prove that the node memory cleared up once I cleared the cache memory.
What this ended up resulting in was ray killing our tasks because the cache memory was being accounted for as process memory.
Versions / Dependencies
Ray 2.4.0
Reproduction script
Not sure how exactly to prove this yet with a repro script.
Issue Severity
Medium: It is a significant difficulty but I can work around it.