popsim-consortium / stdpopsim

A library of standard population genetic models
GNU General Public License v3.0
121 stars 86 forks source link

Cache invalidation #619

Open grahamgower opened 3 years ago

grahamgower commented 3 years ago

We currently cache genetic map files, and will soon be caching annotations too. But we have no mechanism for expiring the old cached data when updates occur. For example, consider the PonAbe genetic maps which need updating (#595). Once new files get uploaded to AWS, users will still have the old files in their cache (so new files won't get downloaded unless they have a different name). This is an important problem that will need to be resolved soon.

We should add checksums for files as in #561, and remove stale cached files by comparing the checksum. This fixes the essential problem, but will still leave old files in the cache that are no longer used.

jeromekelleher commented 3 years ago

Good call. Perhaps we should include the Ensembl version in the filename as a straightforward check? I'm less concerned about the cache containing lots of old data as I am about having simulations giving different results on different machines, so perhaps we should break these into two issues?

grahamgower commented 3 years ago

That seems very sensible for annotations. But that doesn't make so much sense for the genetic maps, which are tied to an assembly rather than an ensembl release.

jeromekelleher commented 3 years ago

But that doesn't make so much sense for the genetic maps, which are tied to an assembly rather than an ensembl release.

These already have the assembly build in the filename, so that should be safe enough.