Open grahamgower opened 3 years ago
Good call. Perhaps we should include the Ensembl version in the filename as a straightforward check? I'm less concerned about the cache containing lots of old data as I am about having simulations giving different results on different machines, so perhaps we should break these into two issues?
That seems very sensible for annotations. But that doesn't make so much sense for the genetic maps, which are tied to an assembly rather than an ensembl release.
But that doesn't make so much sense for the genetic maps, which are tied to an assembly rather than an ensembl release.
These already have the assembly build in the filename, so that should be safe enough.
We currently cache genetic map files, and will soon be caching annotations too. But we have no mechanism for expiring the old cached data when updates occur. For example, consider the PonAbe genetic maps which need updating (#595). Once new files get uploaded to AWS, users will still have the old files in their cache (so new files won't get downloaded unless they have a different name). This is an important problem that will need to be resolved soon.
We should add checksums for files as in #561, and remove stale cached files by comparing the checksum. This fixes the essential problem, but will still leave old files in the cache that are no longer used.