Closed MahdiBM closed 1 month ago
@hassila I'm writing some real benchmarks now, and the benchmarks target needs to depend on a big monolithic target. The problem is, when I do benchmark run, and then benchmark check, it builds that monolithic target twice, each time taking 11 minutes on an arm 64cpu 128ram bare-metal machine.
While I see this as mostly a Swift/SwiftPM problem that builds take soooo long, I also notice that the benchmark package doesn't need to build the benchmarking target the second time as we're checking static thresholds vs already-calculated baselines.
So hopefully there can be a command or something so "check" can skip building or running anything at all, and only perform checks with what is saved on disk.
Yes, it is a known problem unfortunately:
https://github.com/swiftlang/swift-package-manager/issues/7210
Would be great if you can ping in that issue and explain the issue above - maybe we can add a "--skip-build" to work around it (but it really should be done by SwiftPM, we just use the official API and it rebuilds always...).
Sure, but I'm skeptical about when will SwiftPM maintainers get to any issues. Would be very very nice to have this feature while we're waiting a possible few years for SwiftPM.
As always I'm open to submitting a PR if needed.
It is a bit tricky, as we use the build to also get us the paths to the resulting built executables - which are needed.
I tried a quick and dirty --skip-builds
flag, but it won't fly.
I am open to suggestions/PR:s (still worth pinging in the case though, I do think the more noise there is about something, the more likely it is to be fixed... The squeaky wheel gets the grease.)
This issue should be fixed by https://github.com/ordo-one/package-benchmark/pull/284 now, if you can please have a go testing against that (it will be merged fairly soon).
@hassila
swift package --disable-sandbox \
benchmark thresholds check \
"${{ github.head_ref || github.ref_name }}" \
--check-absolute-path $PWD/Benchmarks/Thresholds/ \
--no-progress \
--format markdown
did still build the benchmarks, unlike benchmark read
.
--check-absolute-path
is not used for that, you want to use --path
It needs to build the benchmarks for the check
operation to get the latest threshold tolerances from the code itself.
For what it's worth this issue is resolved now.
Also, for some reason, using:
swift package --disable-sandbox \
benchmark baseline update \
"${{ github.head_ref || github.ref_name }}"
swift package --disable-sandbox \
benchmark thresholds check \
"${{ github.head_ref || github.ref_name }}" \
--path $PWD/Benchmarks/Thresholds/ \
--no-progress \
--format markdown
No longer builds the whole benchmarks target twice. I notice that it does try to build the benchmark target even in the second command because 1- I've confirmed thresholds-tolerances do work 2- I can see some warnings about a dependency of the benchmark target (Swift 6 regression); but it pretty much just uses the previous build and does not waste time building the target again. No complaints as this solves my whole problem, but would be nice to know what's going on?!
Glad to hear it is working better now - the build is a bit of a mystery of swiftpm I think - it should only rebuild what's needed between runs. Are there any differences on os/runtime environments between your tests when it failed vs now?
Should try to nail down the related swiftpm bug probably - maybe this can help track down some more info.
Not really no environment differences.
Maybe the fact that I'm using --disable-sandbox
helped? But I think I enabled using that sooner and it didn't work?
Can try that later to see if it's helping anything 🤔.
Perhaps - I haven't nailed down exactly when it rebuilds, just that it's way too often. Happy to hear results if your try it.
But maybe close this issue then and we can discuss the builds separately?
I mean we can even close this issue and keep talking 😛
@hassila I couldn't find out why SwiftPM is deciding to not totally build the benchmark target from the ground up.
Right now, majority of the times it won't re-build the whole target. But sometimes it does. Even with no code changes (e.g. rerunning the same GitHub Action can result in a different behavior... perhaps the .build cache was updated and caused the different behavior).
I did try removing --disable-sandbox
. No difference.
Though I feel like I only started getting those "most of the times uses cache but sometimes rebuilds the whole target" after I removed --disable-sandbox
?! not sure.
Yeah... thanks for trying - hope eventually figuring it out to try to fix root cause...
@hassila This "caching sometimes works and sometimes not" situation is flaky enough that I decided to not rely on it and just only do 1 check instead of running comparison, then reading result, then doing the check. Just reporting back about what we've (so far) end up doing.
This is also why I've filed this issue: https://github.com/ordo-one/package-benchmark/issues/288
@hassila interesting observation:
It's been a few days I've added -c release
to all swift package
calls and there hasn't been any double-builds 🤔.
I noticed the benchmarking package builds some stuff in release mode anyway to be able to reliably benchmark performance, so I thought maybe having both debug and release builds together goes too hard on SwiftPM.
I'll monitor for the next few days as well and see if the double-build happens again or not.
Also for the record, using -c release
did change the result of one of the two of our benchmarks from 310ms -> 290ms. Not sure why that should happen, might be a sign that we should use -c release
.
Hmm, that is weird - if that is correct, it would imply that building in debug somehow invalidates the release build - we're just calling into the SwiftPM API and ask for a release build (and all normal caching should happen...).
@hassila right ... this is definitely a bug somewhere ...
it would imply that building in debug somehow invalidates the release build
It would have been nice if it was that simple. The problem is it isn't consistent in invalidating the release build either. Or at least i haven't noticed the pattern.
Just to confirm, -c release
is still working like I mentioned above, around 24h ago. I'll still monitor the situation.
benchmark baseline check
doesn't respect threshold-tolerances when checking already-acquired baseline which is on disk, against static thresholds.json
files. Apparently it's because the benchmarks are not loaded so the tool falls back to the strict threshold-tolerances.This makes our benchmarks fail even though we have less than 1% and 100ms of fluctuations.