michellab / BioSimSpace

Code and resources for the EPSRC BioSimSpace project.
https://biosimspace.org
GNU General Public License v3.0
77 stars 19 forks source link

Strategy for migration to OpenBioSim GitHub organisation #393

Closed lohedges closed 1 year ago

lohedges commented 1 year ago

Opening this thread for a discussion about strategies for migrating Sire and BioSimSpace to the OpenBioSim GitHub organisation.

Timing

Do we want to do this ASAP, or coincide with a milestone, e.g. the 2023.1.0 releases of Sire and BioSimSpace. At the moment there is quite a lot of development work going on in various branches (and forks) and much needs to be synchronised. (Particularly BioSimSpace.)

Git migration

The Sire repo is quite old and has grown fairly large. From discussions with @chryswoods, it seems like the migration to OpenBioSim might be a good opportunity to tidy things up. At present, I estimate the following sizes for Sire and BioSimSpace, using:

git bundle create tmp.bundle --all
du -sh tmp.bundle 

This gives 183M for Sire and 33M for BioSimSpace.

I don't want to simply re-upload a fresh repository, since the history is important for posterity and for tracking down bugs, e.g. by bisecting commits. I'm currently exploring the use of the BFG tool for pruning redundant large files from the repository. (This is an an alternative to using git-filter-branch directly, and is apparently much faster.) I'll have a play around on a local mirror-clone to see how much we can trim things down.

Conda packages

We currently upload packages to the michellab organisation as part of our CI. I've now created an openbiosim organisation (all lower case for ease of typing, happy to reformat as desired) and we can switch the CI upload over to using this once the migration is complete. I am currently the owner of the organisation, but I can add anyone else with an account at anaconda.org. (It would be good to have multiple members of the OpenBioSim organisation as owners so we can transfer if needed in future.) Note that there is a nominal 3G storage limit for organisations at anaconda.org. However, we've far exceeded this with michellab (currently at 10G) and have never had issues. (I'm not sure if it just automatically prunes the old files or not.) I doubt we'll ever care about the limit, since we can always manually prune old development packages and only keep those with the main label. I assume that we still plan to move to conda-forge at some point, so perhaps this doesn't matter. (Maybe we'll keep our own channel for development builds only.)

Tests

The size of input files used for testing is becoming burdensome, so it might be useful to split these out to a separate repository. They could then be downloaded at run time using the new sire.load functionality. Perhaps these could simply be hosted in a separate GitHub repository with permalinks used for the files. Alternatively, we could look at hosting these files elsewhere, although it might be a pain to update paths etc if they ever need to move.

It would also be good to discuss the use of a legacy test suite for Sire, based on the existing tests from the SireUnitTests repository. I have now made this fully 2023.0.0 compliant and have reinstanted the only test of SOMD FEP setup functionality.

lohedges commented 1 year ago

Closing since we've migrated :-)