Open johamster opened 2 years ago
Thanks for the report!
Based on what you describe my first thought is Gatekeeper or some other sort of macOS process creation bottleneck. It might be worth looking at Activity Monitor or top to see if there's some system process that is consuming CPU.
Ahhh I just saw the note about parentalcontrolsd. Yes, sadly, I've seen it be responsible for high CPU usage in other contexts though I've never been able to reproduce it.
You may try following the commands mentioned at: https://github.com/MohammedFadin/ParentalControlSD_Fixer/blob/master/app.js
Would you be ok making a spindump of parentalcontrolsd and sending it to my email rain@sunshowers.io? There's an Apple employee that's willing to look: https://twitter.com/Catfish_Man/status/1534577710172778497
Here's how to create a spindump (create one for parentalcontrolsd in this case): https://support.zoom.us/hc/en-us/articles/206435255-Creating-a-Spindump-file-on-macOS
Hi @sunshowers,
I work on the same team as Johannes and am also affected. I executed the gatekeeper commands you mentioned above, but it unfortunately does not help.
I also created a spindump of parentalcontrolsd
and might be able to share it with an Apple employee directly. Since it is related to an internal project though, there would be some bureaucracy involved..
Instead I created this repository which reproduces the issue in isolation and was also run on x86-64 Linux. The main summary:
cargo test | cargo nextest run | |
---|---|---|
Darwin | real 1.056s | real 6.790s |
Linux | real 1.207s | real 0.449s |
So if one has cargo
and cargo-nextest
installed this allows for a deeper investigation anyway. An additional disclaimer: we are experiencing this on a corporate image of MacOS, so perhaps a custom IT policy is interfering. We would appreciate it very much if someone could tell us if they were able to reproduce this on their M1 machine.
Kind regards, Jonas
cc: @Catfish-Man
Thanks for the minimized test case! I do think this is generally an issue with macOS process execution performance. It could definitely be a custom IT policy that e.g. deployed an endpoint security extension: https://developer.apple.com/documentation/endpointsecurity
Most such extensions will intercept each process execution. If they don't respond quickly enough, nextest will slow down.
I don't have a suitable Mac computer to test with but perhaps you or someone else does?
I work on the same team as Johannes and Jonas :)
I did some experimenting, and I am fairly confident that the issue is caused by parentalcontrolsd
, or some kind of extension called by this process (but I think I can rule out endpoint security extensions because I deinstalled them - see below).
This is what I tried:
I started our test suite in a terminal and ran sysdiagnose -p <PID-of-parentalcontrolsd
in another terminal while the tests were running. This created a .tar.gz
file with 500Mb of log data. I grep
ed for parentalcontrolsd
and found many hits in a file named fs_usage.txt
. From this file I extracted all lines with string parentalcontrolsd
into a separate file and inspected the result.
I found 48.6k calls to getdirentries64
within a time span of 11 seconds. I cannot tell whether this really is "a lot", but it made me suspicious. The calls to getdirentries64
where mostly related to the same folder which happens to be the folder where the compiled test programs are located: <rust-project>/target/debug/deps
.
When I checked, this folder contained more than 5.6k files, mostly .o
and .d
files. After I deleted these files (rm *.o *.d
) and re-ran the test suite, it was much faster. For the test repository provided by Jonas, folder target/debug/deps
only contains 382 files, but the performance gain from deleting all .o
and .d
files is still significant: The run time of cargo nextest run
goes down from 4.7 seconds to 1.4 seconds on my machine (M1 Pro 2021).
Hypothesis:
I suspect that the performance regression is caused by parentalcontrolsd
. It seems that parentalcontrolsd
scans the containing folder of any program which is started. For folders containing many files, this has a significant performance impact.
When I execute systemextensionsctl list
on my corporate laptop, I see one entry with bundleID "com.jamf.protect.security-extension". I can temporarily deinstall Jamf Protect, which removes the security-extension, and even without the security-extension, the observations I described above are the same: reducing the number of files in the test programs' foler significantly reduces the runtime of the test suite. I do not know if parentalcontrolsd
may trigger other kinds of extensions, but if this is not the case, parentalcontrolsd
itself seems to be the culprit.
Thank you so much for the investigation @quadruple-output! It does look like an issue with Apple's software at this point. This sort of linear scan through a directory each time a program is started is bound to cause performance issues.
It's possible that thanks to your investigation this can be fixed upstream, with Apple, so thanks again.
I've forwarded the report over, thanks!
One way nextest can work around this is to do a simplified version of reusing builds:
REUSE BUILD OPTIONS
called e.g. --hardlink-binaries
(make it incompatible with the other options in this section if it aids in simplicity)target/nextest
(virtually always guaranteed to be on the same filesystem).We'd welcome contributions if you'd like to implement this -- it is probably 1 day worth of work including integration tests.
We have news that may be relevant for the report you sent upstream…
We learned from our IT department that they recently enabled a new mobile device management (MDM) configuration profile (I hope I am using the correct terms here). The profile contains a specific list of application paths that are not allowed to be launched on our managed endpoints. As far as I understand, this is achieved by leveraging a standard MDM feature, though this feature is marked as "deprecated".
Our IT colleagues would like to open an Apple ticket on their own. It would be great if you could share the ID of yours, so we can link the issues.
@quadruple-output thanks for waiting! sorry, just got the details: it's rdar://95625035
I'm wondering, could you give it a shot again with cargo-nextest 0.9.43 and macOS Ventura? Nextest now uses posix_spawn
rather than fork/exec on Mac, which could potentially have made this better. (If building locally, make sure it's with Rust 1.65)
I don't think that I have the possibility to re-test this, I'm afraid. As stated above, the issue was triggered by a configuration profile that had been rolled-out by our central IT team. After our complaints, they reverted this profile, such that the issue does not occur, any more. After all, I don't see regard this as a bug in nextest. Maybe we can close this issue (@johamster ?).
Thanks for getting back! I'll also ask my Apple contact about the status.
Hi,
first of all thanks for building
nextest
it is great (except for what follows here).We observed a performance regression in our test execution times from approx. 20-30s to >200s. So far, we don't have an explanation. We observed it yesterday but it could have gone unnoticed for a few days.
It only happens on Mac with M1 (pro) chips. Linux on x86 machines (e.g., our CI) is not affected.
So far we tried the following without success:
cargo-nextest
in different revisions (from 0.9.0 - 0.9.16).cargo clear
.Further, we observed that
nextest
processes are spawned as expected. However we observed the following:nextest
the following service shows up in activity monitor: parentalcontrolsd with significant CPU utilization. Not sure if that is expected.Any ideas? We are happy to help.
Kind regards, Johannes