michellab / Sire

Sire Molecular Simulations Framework
http://siremol.org
GNU General Public License v3.0
95 stars 26 forks source link

2023.X code complete :-) #419

Closed chryswoods closed 1 year ago

chryswoods commented 1 year ago

Hi @lohedges - no rush to look at this now (it can wait until after the holidays). Here is the 2023.X pull request. I've got no more code to add before 2023.1, so everything now is just debugging and developing the website.

I have compiled and tested this on MacOS (M1) and Linux (X86) with the latest BioSimSpace devel. I had to make a small change to BioSimSpace, which is in the feature-2023_0 branch there (I had to update the property Y searches to X with property Y as sire didn't know what view level to search and gave the wrong result). With these very small changes, all the BioSimSpace tests that don't require namd installed pass.

Note that the first test in test_align fails on MacOS M1 due to rdkit giving a different match (rdkit was already strange in that it installed itself into python3.10/site-packages rather than python3.9/site-packages, despite installing a 3.9 package - I manually fixed this by copying the rdkit package into the right place). Also note that I removed amber and gromacs as requirements from MacOS builds as they don't exist for M1 Macs. I'll re-enable these once I find an easy way to specify "not M1" in a way that pip-requirements-parser can understand.

I will create a mini-feature branch off of feat_2023_0 to continue my website work. Otherwise, hopefully, we are nearly there :-)

chryswoods commented 1 year ago

The Linux builds were ok. The MacOS builds had one cancelled(?) and the other failed on an over-precise energy for a test. I'll reduce the test precision.

The windows builds failed after packaging? There looked like some weirdness during compile. I am building a Windows VM in the cloud and will check things manually there.

Also, I noticed after issuing that there were some minor problems in the Exscientia sandbox relating to the new code. The problems all came back to one of the major changes since the last merge - I've made the units system much more coherent and pervasive, with GeneralUnit now becoming the store for pretty much all numeric values, and with Vectors now auto-converting to default length units when they go up to the Python layer. This meant that the BioSimSpace code that did this (fixing the gap in sire functionality) was now not needed, and failed as values were already lengths. I fixed what I saw, so that all of the tests pass (pushed to the feature-2023_0 BioSimSpace branch). There may be other subtle places that this will impact, and that I'll look for in the new year).

chryswoods commented 1 year ago

Still failures... MacOS was because I missed another over-prescisioned energy test.

Both Windows builds failed because they ran out of disk space. The logs now say this, which at least is helpful as it confirms our theory.

C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\Common7\IDE\VC\VCTargets\Microsoft.CppCommon.targets(423,5): error MSB6003: The specified task executable "CL.exe" could not be run. There is not enough space on the disk. 

This time the failure happens during the build itself (but still, bizarrely, the CI continues, packages sire, and the actual error that stops everything is when import sire fails because the package is corrupted).

I've checked on the Windows VM I built (Python 3.9). Everything does build on Windows and all tests pass (except the somd test which is skipped). The size of the sire directory (containing source code) is 7.04 GB. Of this, 6.7 GB is the build directory.

My mambaforge directory on Windows is 13.5 GB. The default on Windows is now a Python 3.10 environment. Of this, the Python 3.9 environment I used to compile sire is 5.35 GB.

Once built, the sire libraries themselves are not too large (the Sire DLLs are ~28 MB total, with SireMol 5.4 MB and SireMM 6.3 MB). The wrappers are larger (~63 MB total, with _Mol 21 MB and _MM 13 MB).

GitHub runners provide a minimum of 14 GB of free space.

We consume some on Windows installing Visual Studio. I guess, during a CI build, we use ~6-8 GB for our Python environment, and then 7 GB for the build itself. I can see why we are hitting the limit...

Searches for "GitHub actions running out disk" show a lot of suggestions. Most involve removing un-used files from the runner before starting. We need, I guess, another 2-3 GB. Removing the build dir before packaging works well, as this is definitely larger than what would be needed for packaging and testing. I think we should do this on all OS builds.

I will explore the CI on Windows and will look to see if there are any directories that can be removed...

chryswoods commented 1 year ago

Interesting - on Windows they separate the C: drive (where choco installs VSCode) from the D: drive (where you initially git clone). The D: drive has 14 GB, while the C: drive can have a lot more disk free. The solution will be to move the build into the C: drive. This has recently been increased to 256 GB, of which ~195 GB are used (based on what I read here). This would be more than enough space... :-)

chryswoods commented 1 year ago

And that is exactly what the default does... GitHub clones into D:, but puts conda/mamba into C: and expects you to do most of the conda stuff there. However(!) I broke this a while back when I moved the conda environment to the croot of the mambabuild to github.workspace (i.e. the D: drive!).

conda mambabuild --croot ${{ github.workspace }}/build -c conda-forge -c michellab/label/dev ${{ github.workspace }}/Sire/recipes/sire

I did this so we could collect failed artefacts. I am going to remove this and (temporarily) remove failed artefact support. I will come up with another way to do this later (likely by finding the auto-generated name within the default mambabuild croot.

chryswoods commented 1 year ago

Everything worked, so I took some time to learn how to reduce the repo size. In doing this I found a few unnecessary large files, which I removed (hence the new GitHub actions run - just to ensure they weren't referred to anywhere).

I found that the largest current file in feat_2023_X was 4 MB, so I used BFG-repo-cleaner to remove all files that were larger than 5 MB.

(the command to find large files was du -am . | sort -nr | grep '\..*\.' | head, which I am going to use a lot in the future as it is super-useful!)

$ git clone --mirror https://github.com/michellab/sire.git
$ java -jar ./bfg.jar --strip-blobs-bigger-than 5M --no-blob-protection sire.git
$ cd sire.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ cd ..
$ git clone --single-branch --branch feat_2023_0 /path/to/sire.git
$ cd sire
$ git remote prune origin
$ git remote remove origin

This reduced the .git folder down from 194 MB to 64 MB. I don't think I can go any smaller, as the 64 MB is a single git "pack" file that contains everything in a packed/compressed format. I've pushed this to a test repo on OpenBioSim. I pushed a single branch, which corresponds to this feat_2023_0 branch.

I've checked and all the history is preserved. It may be worth taking a look too in case you see anything weird.

This now clones a little quicker, has very few files (so uses few inodes), and a checked-out sire is now 111 MB (rather than 240 MB as before).

lohedges commented 1 year ago

Many thanks for this :+1: I'll try to go through it soon. With regards to the Windows changes that you mentioned, i.e. the different paths used for installing Minconda, etc, I assume this mean that we no longer need to remove the intermediate build files so I can safely remove that part from our conda-build install scripts.

Cheers.

chryswoods commented 1 year ago

Yes - that's correct :-)

lohedges commented 1 year ago

Great. I'm just building it locally. I'll then checkout the feature-2023_0 BioSioSimSpace branch and make sure that there aren't any further issues there. (I'm thinking that some Vector updates might be required in the funnel metadynamics code.)

chryswoods commented 1 year ago

I've pushed in all of the changes from feat_2023_website. Most of the changes relate to the website. The only code changes are;

I also updated setup.py so that it auto-installs pip-requirements-parser and the compilers. This doesn't totally work on MacOS, as installing these changes the Python interpreter, which causes a subprocess failure (so you still need to run setup.py 3 times). But it does work on Linux :-)

I also updated the LICENSE to confirm that Sire is GPL3. This is because our code is GPL2 or later, but our dependencies are now GPL3, thus forcing our change.

lohedges commented 1 year ago

Many thanks. I can now confirm that all of the NAMD tests in BioSimSpace pass. I'll check all of the places where sire.legacy.Maths.Vector is used to ensure that things still work as expected.

chryswoods commented 1 year ago

Inspired by our conversation earlier, I've run black against all the new Python code. I've also used flake8 to lint things and have fixed pretty much all of the warnings and errors.