ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.22k stars 5.81k forks source link

[Core] Misleading documentation about `num_cpus` and physical resources #48867

Open paul-twelvelabs opened 1 week ago

paul-twelvelabs commented 1 week ago

Description

The Physical Resources and Logical Resources section of the Ray docs, very explicitly states

Resource requirements of tasks or actors do NOT impose limits on actual physical resource usage. For example, Ray doesn’t prevent a num_cpus=1 task from launching multiple threads and using multiple physical CPUs.

While technically true, this sections reads as if num_cpus is strictly for scheduling and has no implication for job performance. However, this is untrue and this section of the docs contradicts the NOTE in Cluster Resources which highlights explicitly the interaction of num_cpus and OMP_NUM_THREADS (and, by extension, torch.get_num_cpus(), etc).

Ray sets the environment variable OMP_NUM_THREADS= if num_cpus is set on the task/actor

In practice, lowering OMP_NUM_THREADS can lead to a pretty meaningful degradation in job perf, especially for jobs that require torch and numpy.

Of note, Physical Resources and Logical Resources is very high in the docs tree: it's under Ray -> User Guides. Cluster Resources is much lower under Developer Guides -> Configuring Ray. This adds to the confusion.

One suggestion would be to explicitly mention how num_cpus affects OMP_NUM_THREADS in the Physical Resources and Logical Resources section. Or, just link to Cluster Resources from there.

Link

Physical Resources and Logical Resources

Cluster Resources

Superskyyy commented 1 week ago

I guess adding a hyperlink to that as an exception could suffice for now as this is the only exception that we know of according to the current impl?

paul-twelvelabs commented 1 week ago

That sounds reasonable, but please make it pronounced! (e.g. a NOTE callout, or something, akin to how it's mentioned in the Cluster Resources section as it's a very important exception).

FWIW, in practice, we'd set num_cpus=0.25 on the mistaken belief that doing so had no perf implications; this caused OMP_NUM_THREADS=1 and ultimately was the source of a 25-30% perf degradation. For use cases that require torch/numpy, which I'd imagine are numerous, not knowing about this can be fairly damning.

Superskyyy commented 1 week ago

Let me do that.