praetorian-inc / noseyparker

Nosey Parker is a command-line program that finds secrets and sensitive information in textual data and Git history.
Apache License 2.0
1.66k stars 79 forks source link

Investigate and fix spurious CI rebuilds #164

Closed bradlarsen closed 6 months ago

bradlarsen commented 6 months ago

See #163.

bradlarsen commented 6 months ago

One problem is that the Actions cache keys seem to not be granular enough. For example, from this run on this PR, at the end in the Post Cache step, I see this:

Failed to save: Unable to reserve cache with key v0-rust-tests-d4f00fb9-f15fd4a2, another job may be creating this cache. More details: Cache already exists. Scope: refs/pull/164/merge, Key: v0-rust-tests-d4f00fb9-f15fd4a2, Version: 42df905b237291e2f2fd1f9794a67c2fbd905dfc929c4f0592069bf128a78777
image
bradlarsen commented 6 months ago

Another issue: it appears that even when a cache is successfully extracted in GitHub Actions, that it does not contain the fingerprint files used by cargo. This may be causing nearly everything to be assumed to be needing to be rebuilt.

For example, see Run 2 of the CI (macos-13.arm64.stable.test) job for this PR:

image
bradlarsen commented 6 months ago

Okay, in Run 2 of the CI (macos-13.arm64.stable.test) job for this PR, it looks like caching was actually effective.

The cargo fingerprint files were missing for the crates defined directly in Nosey Parker, but not missing for those from downloaded dependencies. The end result in this case is that 450/475 crates did not need to be rebuilt, as they were retrieved from the cache:

image

See the build-timings.macos-13.arm64.stable.test artifact for full details of the build profiling.

Now, the net improvement from caching in this case was a reduction in wall clock for the CI (macos-13.arm64.stable.test) job going from 173s down to 132s, or a 31% reduction. Of the total time for the faster run of that job, 76s were spent building the project. Of that, 68s were spent building the internal vectorscan crate.

If vectorscan were moved into its own project, published to crates.io, and then used in Nosey Parker that way, it would eliminate 52% of the total CI job wall clock time for CI (macos-13.arm64.stable.test).

bradlarsen commented 6 months ago

Okay, after some configuration changes on the Swatinem/rust-cache action, the caching behavior is improved.

Failed to save: Unable to reserve cache with key v0-rust-tests-d4f00fb9-f15fd4a2, another job may be creating this cache. More details: Cache already exists. Scope: refs/pull/164/merge, Key: v0-rust-tests-d4f00fb9-f15fd4a2, Version: 42df905b237291e2f2fd1f9794a67c2fbd905dfc929c4f0592069bf128a78777

Regarding this error seen previously, it was due to two different jobs (ubuntu-20.04.x86_64.stable.release and ubuntu-22.04.x86_64.stable.test) sharing the same cache entry, yet being incompatible with each other. The result is that one would always win, populating the cache, and the other would fail to get usable cargo fingerprint data from the cache, causing it to rebuild just about everything.

The fix was to add an explicit cache key based on the CI job name, which causes each of the (currently) 4 build jobs to use its own cache entry.

The net result is that when a cache hit occurs, now, 30-50% of the total build time for the job is eliminated.