Open paul-twelvelabs opened 1 week ago
I guess adding a hyperlink to that as an exception could suffice for now as this is the only exception that we know of according to the current impl?
That sounds reasonable, but please make it pronounced! (e.g. a NOTE callout, or something, akin to how it's mentioned in the Cluster Resources section as it's a very important exception).
FWIW, in practice, we'd set num_cpus=0.25
on the mistaken belief that doing so had no perf implications; this caused OMP_NUM_THREADS=1
and ultimately was the source of a 25-30% perf degradation. For use cases that require torch/numpy
, which I'd imagine are numerous, not knowing about this can be fairly damning.
Let me do that.
Description
The Physical Resources and Logical Resources section of the Ray docs, very explicitly states
While technically true, this sections reads as if
num_cpus
is strictly for scheduling and has no implication for job performance. However, this is untrue and this section of the docs contradicts the NOTE in Cluster Resources which highlights explicitly the interaction ofnum_cpus
andOMP_NUM_THREADS
(and, by extension,torch.get_num_cpus()
, etc).In practice, lowering
OMP_NUM_THREADS
can lead to a pretty meaningful degradation in job perf, especially for jobs that requiretorch
andnumpy
.Of note, Physical Resources and Logical Resources is very high in the docs tree: it's under Ray -> User Guides. Cluster Resources is much lower under Developer Guides -> Configuring Ray. This adds to the confusion.
One suggestion would be to explicitly mention how
num_cpus
affectsOMP_NUM_THREADS
in the Physical Resources and Logical Resources section. Or, just link to Cluster Resources from there.Link
Physical Resources and Logical Resources
Cluster Resources