microsoft / backfill

A JavaScript caching library for reducing build time
MIT License
157 stars 31 forks source link

Incorrect hash used if file is modified and hashed twice within 100ns on Windows #427

Open ecraig12345 opened 2 years ago

ecraig12345 commented 2 years ago

This is an extreme edge case which is very unlikely to occur outside of tests and is probably not worth the effort to fix (if it's even possible), so this issue is mainly to serve as documentation.

getFileHash uses a cache based on the file modification time, and does not re-calculate the hash if the modification time is unchanged. This is a good approach for perf and usually doesn't cause any problems. https://github.com/microsoft/backfill/blob/03a95d0b65d9a7586204267ca921b3bf1eac4ceb/packages/cache/src/hashFile.ts#L37-L44

However, the cache storage test was failing on Windows in CI (I couldn't repro locally) because a file that's changed in the code below was not being registered as changed. This was because the hash hadn't changed.

It turns out the reason the hash hadn't changed was because the CI build runs quickly enough that the file creation and modification happened within the same 100ns interval. This was fine on Mac and Linux which use filesystems that record mtimes with 1ns precision, but NTFS only records mtimes with 100ns precision. So on NTFS, the mtime wouldn't change and the cached hash would be used.

I worked around this in the test by adding a 1ms wait. Based on how backfill is actually used, such as in lage, it seems extremely unlikely that this would happen in a real scenario.

https://github.com/microsoft/backfill/blob/03a95d0b65d9a7586204267ca921b3bf1eac4ceb/packages/cache/src/__tests__/cacheStorage.test.ts#L39-L53