Closed Swiddis closed 1 month ago
Experimented with this more on the downstream issue, see https://github.com/opensearch-project/dashboards-observability/issues/2188#issuecomment-2389541496. I experimentally verified that there's no build time savings at all when using the current cache system.
Proof of concept that works locally, I replaced the getMtimes
method in-place to compute hashes instead of the actual mtimes: https://github.com/Swiddis/OpenSearch-Dashboards/commit/1e2529c8e26fc220a38cde2b3697e57a1be0016d.
Only issue is that it'd never get through PR review to keep the old naming of everything. Also need to figure out how to test it in the real build.
Set up some job runs connected to the POC:
It works as expected. (That's not a vertical line -- it takes 0.4 seconds to hash all of the source files)
To avoid all the renaming work, we can just say MTIME stands for "Modern Tracking of Incremental Modification Events"
Is your feature request related to a problem? Please describe.
Coming from https://github.com/opensearch-project/dashboards-observability/issues/2188, I'm hoping to implement caching for the compilation step of OSD, because this takes more than 10 minutes per CI run. Across all of our integ test workflows, that's multiple hours of compute time per code change.
I found that
osd-optimizer
determines whether to build files based on the last-modified time (mtime
), instead of being based on the actual content of the file. This means that the optimizer can never work in CI, because pulling code will always update themtime
.Describe the solution you'd like
In
.osd-optimizer-cache
, instead of thecacheKey
depending onmtimes
Base it on a hash of these files' content (e.g. sha256, blake3, xxhash, siphash...)
This makes it possible to cache build artifacts even when moving files between machines. Instead of
actual_mtime > cache_mtime
, checkactual_hash != cache_hash
.Describe alternatives you've considered
N/A
Additional context
N/A