Closed xiyu1984 closed 6 months ago
I found the reason may relate to the competition for CPU resources between the work-stealing strategy
of rayon
and the system.
Again, note that Everything worked well on the previous versions of MacOS before Sonoma 14.4.
When not manually setting the num_threads(...)
, the default num_threads
is 14 (m3 max 14 + 16). In this case, the system will "rob" the CPU resources back, and the "robbing" itself is costly.
Then I limited the num_threads
to 4 as follows:
rayon::ThreadPoolBuilder::new().num_threads(4).build_global().unwrap();
The system is still "robbing", but the user process can use these 4 threads most of the time. And in my program's case, the performance improves although it's still much slower than before as the CPU cores are not fully exploited. This is just a temporary solution.
That's worrisome, but I'm afraid I don't have any Apple hardware to test this myself. Hopefully others in the community can share their experience and help debug what's going on.
One small tip -- if you haven't set num_threads
manually, the RAYON_NUM_THREADS
environment variable will also override the default setting.
I think the problem may not be all related to rayon
.
From my experience until now, the number of threads needs to be limited below the number of cores. The details are as follows:
num_threads
to 4
, parallel works stably.num_threads
to 8
, parallel works stably sometimes, but there's a chance to be "robbed".num_threads
to 12
, parallel works stably sometimes, but there's a higher chance to be "robbed".Maybe the larger num_threads
be used, the higher the probability of being "robbed".
And this is how the resource was "robbed" by the system
:
and maybe this is why 4
num_threads
can work.
I think you'll need to figure out what that System time is actually doing, because that looks pathological. Does Xcode profiling or anything like that reveal System details?
I think you'll need to figure out what that System time is actually doing, because that looks pathological. Does Xcode profiling or anything like that reveal System details?
Now I just checked the information in the activity monitor
, and as you said, there were pathological and conflict phenomena.
The picture reveals the system
takes the CPU resources away, but the cost of each process shows it's my process that takes the most CPU resources. But I'm sure that my process is slowed down so the CPU resources are not computing it.
Anyway, I will look into this problem more deeply soon according to your suggestion.
Things are clearer.
I made a deeper profiling and found that with a higher parallel, my process needs more memory, and then the security checking in the kernel is raised, which is costly.
This might be confirmed by https://appleinsider.com/articles/24/03/21/apple-silicon-vulnerability-leaks-encryption-keys-and-cant-be-patched-easily
Luckily, rayon
still works well.
My program needs high parallel, and I use
rayon
v1.9.0 to make the data parallel processing. It works nicely in the previous version of MacOS Sonoma, but after I updated it to Sonoma 14.4, everything slowed down. The underlying schedule of the parallel mechanism seems to have changed in Sonoma 14.4.This may not be the problem of
rayon
. Is there anyone met this problem too?