loculus-project / loculus

An open-source software package to power microbial genomic databases
https://loculus.org
GNU Affero General Public License v3.0
37 stars 2 forks source link

Delete processed data of no-longer-current pipeline versions after upgrade to save on storage #3240

Open corneliusroemer opened 2 days ago

corneliusroemer commented 2 days ago

After we bump the current processing pipeline version, processed data of previous pipeline versions doesn't serve a purpose anymore. We should just delete it to not accrue waste.

No longer purposeful processed data is the main source of db storage at the moment in production pathoplexus (already noticeably slowing down cloning from prod to staging).

Proposed feature:

See this comment for an analysis of db storage: https://github.com/loculus-project/loculus/pull/3232#issuecomment-2485779144

The alternative is manually deleting old versions but I don't see reason for doing this manually when we can do it automatically.

chaoran-chen commented 2 days ago

I wonder whether we should keep current - 1 around just in case one would like to reverse and delete older ones.