Closed chryswoods closed 2 years ago
(forgot to say - this also compiles fully natively on Apple M1. It just doesn't run as there is something killing the process - I think related to a lack of code signing?)
Awesome !! This is incredibly useful and timely.
Thanks for this. Just a few comments on the changes for reference to those who may be affected:
The switch to using OpenMM from the conda-forge channel removes support for older CUDA drivers, along with our optimise_openmm
script that used the labels on the omnia channel to find an appropriate driver. @jmichel80: Is this change problematic for the clusters that your group use? Have you tested the most recent OpenMM and CUDA driver previously.
The failing BioSimSpace tests are a result of changes to RDKit, which now requires that a molecule is sanitized before being able to use the GetSubstructureMatches
function, which we use when scoring the MCS matches from RDKit. Although applying sanitization makes our code work, certain tests now fail since the resulting MCS matches are slightly different, so the existing edge cases being tested are no longer triggered. At present I have left the problem tests in place, marking them as xfail
along with an appropriate reason. We can adapt them as necessary once other troublesome ligand pairs are found. (The logic behind the tests is still sound.) In future it may be necessary that we fully sanitize all molecules prior to performing the MCS itself, completely remove sanitize
option from our matchAtoms
function. I am also not sure if the sanitization change has any knock-on affects for LOMAP or our wrappers around this. Perhaps @JenkeScheen can test this once things are merged in and BioSImSpace has been updated too.
The updates here have been tested on macOS and Linux but not Windows. I imagine the Windows build will also need to be adapted, particularly the versions for the Miniconda and dependencies that are used. (We've used the most recent compatible versions on macOS and Linux, but these might not all be available on Windows.) Just tagging in @nigel-cresset and @mark-cresset so that they are aware of the impending changes.
Cheers.
this is great! Yes @lohedges I'd be happy to test that when needed - RDKit can be quite finnicky when it comes to sanitization.
I've merged in all of the changes from feature_pickle at @lohedges request.
feature_pickle provide full native pickle support to all of the Sire objects. This uses the pickle support baked into boost::python, together with the Sire streaming operators attached to most objects. The wrapper-generation script now adds the necessary code to each wrapper if the object supports steaming via QDataStream.
The binary data is formatted in an identical manner to the Sire s3 files, except without a header. This is then compressed, then base64 encoded. This is returned in a dictionary from the Python __getstate__
function, together with a version number that we can use to track pickle versioning (in case we want to change things later).
I've tested by pickling across operating systems (Mac to Linux and back). What is nice is that pickle handles loading all of the dependent modules itself. For example, you can just;
== save script ==
import Sire.System
m = Sire.System.create_test_molecule()
import pickle
with open("dump.txt", "rb") as FILE:
FILE.write(pickle.dumps(m))
=== load script ===
import pickle
data = open("dump.txt", "rb").read()
s = pickle.loads(data)
and the load script will import the required libraries.
I wouldn't use pickling for long-term data archiving, but this should make things easier for workflow scripts, running jobs via dask / multiprocessing etc.
I've now switched to using GitHub actions for our CI. This performs a build using a conda-forge compliant conda-build on Linux and macOS and runs tests as part of the post-build check. If the build is successful, then the resulting package is uploaded to our channel on the Anaconda cloud. We can now support various Python versions (currently 3.7, 3.8, 3.9) on both operating systems.
I'll merge this across to devel and work on fixing any issues with the CI there.
Cheers!
Well, as soon as I say that, the CI inexplicably starts running on the pull request? (There must have been some bizarre delay, or it is only triggered on the commit following the addition of a workflow.) Guess I'll fix issues here then :-)
Just to say that the switch to Python 3.8 as default from compile_sire.sh works well on Linux and Mac. I've checked that all of the dependencies for BioSimSpace can install correctly from this default. Sire will compile and install for Python 3.9, but we then run into dependency issues for BioSimSpace.
For ARM64 (aarch64) I get the furthest using Python 3.8. Python 3.7 doesn't have the dependencies for Sire. Pythons 3.8 and 3.9 both compile and install Sire well, but don't have the dependencies for BioSimSpace. 3.8 is the in the best shape, having openmm and RDKit, and just missing MDAnalysis, MDtraj and pymbar. I can "pip install" MDAnalysis, but have to be careful or it breaks the conda packages...
I know I should raise this as a feature request in BioSimSpace, but would it be doable to have a way of moving MDAnalysis, MDtraj and pymbar to "optional", i.e. the rest of the package can still load without them?
Hmm, will have to think. It get tricky with conda packages since you need to start creating metapackages, rather than having optional dependencies witn pip. (Although they aren't really supported either.) pymbar
should eally be a Sire dependency, I think, since it's required for analyse_freenrg
, which is wrapped bu BioSimSpace.
Hopefully we can come up with a workable solution.
I think the easiest thing would be to detect the platform/machine in BioSimSpace's setup.py
and simply not install the problematic dependencies on Linux ARM64. We could then detect whether the modules can be loaded in BioSimSpace.Trajectory
and BioSimSpace.FreeEnergy.analyse
and raise an exception / warming. It's annoying that we won't be able to read trajectory files, since this pretty much wipes out the use of getSystem
in BioSimSpace.Process
for anything other than GROMACS. (Other than minimisation protocols.) Hopefully it's just a case of waiting for the packages to be updated, after which we can remove the workaround.
Bah, some tests fail on macOS under Python 3.7, which is ironically the Python version that we've currently supported! I'm not sure what the failure is, but it's in SireMove. It could be something intermittent, I suppose. Was hoping to merge this in to devel but I guess I should fire up the macBook and test there. (If only it didn't take multiple hours to debug.)
Once this is fixed then I'll merge in and work on updating the BioSimSpace CI, i.e. to perform a conda-build during the GitHub CI, building from these Sire packages. I'll also need to transfer the Sphinx documentation building to GitHub actions. While this should be okay, several of the plugins that we use require patching (since they are broken) so I'll need to make sure to do this too. (This is easy when we control the Python environment, as with a sire.app.)
Looks like ci skip
isn't working correctly. More great documentation from the GitHub team, including this recent blog post.
Unfortunately the Python 3.7 macOS tests are reliably failing in the CI so I will need to debug further before merging. Alas!
Actually, it looks like an intermittent problem since the build for the subsequent (supposedly ci skipped) commit passed. I'd still like to figure out why it's failing (particularly since it's tied to a specific OS and Python version). If it's a numerical precision issue we could try lowering the accuracy of any comparison that is taking place. Given that it's Sire Move it could also be an OpenMM issue with some starting config/velocity. If so we could try re-running the test a number of times and only reporting failure if it always fails.
And bizarrely [ci skip]
decided to work on my latest commit. I'll merge this across to devel, tag the release, then deal with any fall out there :-)
Bumper pull request containing all of the work needed to get Sire to compile against up-to-date versions of its dependencies.
This enables Sire to compile on all Python versions from 3.7 upwards, using the latest versions of dependencies for each version. This compile script installs to Python 3.7, and pins to the latest version of dependencies that are common to Linux and Mac.
There are substantial mini-changes across the entire codebase due to changes in APIs and deprecations. This includes moving away from qAlgorithm, using the new Qt container constructors, moving to OneAPI, switching to the conda-forge OpenMM and switching to the new C++ ABI.
There is also a new route for generating Sire Python wrappers, which uses the latest version of Py++. This is in the docker/sire-generate-wrappers directory, where the process is documented.
All of the Sire unit tests pass. The vast majority of the BioSimSpace unit tests pass too.
There is a feature_m1_mac branch in the BioSimSpace repo that contains an additional test plus the rdkit align prematch fix.