Closed laurelmay closed 3 years ago
Did a quick update to fix a type hint and to update the commit message which dismissed review. Also realizing I mistakenly made a branch with the same name on this repo when working on this yesterday. That probably needs to be deleted.
I am good with this but I want to wait to merge it until we'll be able to successfully hit all 6 items on the first go. Eclipse seems to be having an outage today so lets plan to merge after https://www.eclipsestatus.io goes green
Caching these headers gives two pretty significant benefits: the first is that we improve performance a bit by caching these fields and the file hash, the second is that we reduce the need for the upstream servers to send the full files in the response to the
GET
request. The cache is preserved using theactions/cache@v2
Action. This will work fine since the goal of this check is to find files that have either had their hash change or that have disappeared.Cache Preservation
The cache is preserved between executions using
actions/cache@v2
. The cache is preserved for up to 7 days, so we'll be able to rely on it so long as we run the lint at least about that often. We also need to specify a unique cache key for each execution because when the Action has an exact cache hit, it doesn't write the cache back. Using a unique key each time but with a common restore key prefix allows us to restore the most recent cache and also write it back each time. If we do lose the cache, it's not a big deal. We just run with an empty cache and write it back again at the end.Cache Contents
The cache stores the following attributes for each URL:
ETag
header returned in the responseLast-Modified
header returned in the responseWe preserve both the
ETag
and theLast-Modified
headers because some servers (like Finch's) don't respond with anETag
. This lets us have a fallback to try to use the hash. And the hash itself is preserved because with a 304 response, we don't get a body. So we need to either cache the full body (which takes way more storage) or just the hash (which is far easier). We always preserve the hash of the response; we don't try to preserve the expected hash. This means that if you receive an invalid hash, it should fail time after time (so long as the ETag doesn't change) because you've stored the hash.Command Output
This adds an additional line at the beginning and the end of the script execution that gives information on the data that was read from and written to the cache. This data should be fairly static between executions unless there's a cache miss. It should be helpful to always have the output for debugging in case we run into a cache issue.
Cache Location
The cache is stored at
~/.cache/hashlint/cache.json
. This keeps it in a directory that still should be accessible or should be able to be created when executing the script locally. For simplicity, we don't try to pull in XDG Dirs configuration.