Closed DanielChern closed 1 week ago
Normally, when a new process is created, it becomes ray::IDLE. And whenever you schedule an actor, it becomes "ray::
Did you observe 90% CPU live, or is it trace captured from some duration? If it is the latter case, I think it is probably possible the tool captures the name ray::IDLE and then something else is scheduled on it after that.
First of all, tnx for the explanation. this perf was captured from 10 seconds of CPU time, but we did that test many times on different nodes, and we saw the same result every time.
is it possible to capture the flamegraph when it uses 90% of cpus? (maybe use py-spy)?
sure Here 10 seconds of py-spy profiles of the ray::ServeReplica processes (you can see that our BL uses only 28%) Not sure if that will give anything since this is just a parent process and I am not sure that the profiler counts child threads , I will try to take a profile of OS threads later if that will help
So this is not ray::IDLE process right?
I guess it may be because ray::IDLE processes are started as soon as the cluster starts, so it has long cpu execution time when you aggregate threads run time? (so I feel like the result is expected)
Upon further investigation, it appears that the issue might simply be a matter of incorrect naming within the Linux threads.
I took the profile of another actor which downloaded images from the internet and then converted/resized them with PIL library.
Meanwhile, in the Linux profile, I noticed that ray::IDLE has an internal process called ImagingResampleHorizontal_8bpc
, part of the PIL library. While this is only 8% of the profile, it still does not make sense why ray::IDLE would call that function and it sounds like part of our actor business logic.
Does it sound reasonable to you? If so, the issue can be closed.
Ok so I managed to profile one of the IDLE processes and it does run our business logic ..... But it is only 20% of the run time (green box), what other 80% of the time I cannot understand (blue box)
you can probably use --native flags to see what's running there. But it does seem like it is just the naming issue to me
Ok, in that case, do you want to close the issue or live it so you can take care of the naming part?
I think the issue here would be in the perf sched profiler itself no? so the changd would have to be in the linux profiler itself?
closing - please reopen if conclusion is incorrect.
Hey Ray team We noticed strange behavior with ray::IDLE threads, When ray applications use ~80% of CPU time 90% of that time is captured by a task called ray::IDLE. We captured it using
perf sched record -- sleep 10 && perf sched latency -s runtime
.Also, you can see that each RayServe process has several IDLE processes, some of them tuning for hours.
Is this something that is expected and just a misleading thread name or maybe there is something wrong that we did?
Versions / Dependencies
Node: GCP C3-standard-22 (22 vCPU 88 GB RAM) Ray:
ray-ml:2.24.0-py310-gpu
Ray Operator : v1.1.0 OS: Linux 6.1.85Reproduction script
Ray start command (3 instances per node):
Issue Severity
Medium: It is a significant difficulty but I can work around it.