michellab / Sire

Sire Molecular Simulations Framework
http://siremol.org
GNU General Public License v3.0
95 stars 26 forks source link

Feature SDF #377

Closed chryswoods closed 2 years ago

chryswoods commented 2 years ago

I've added in a complete SDF (MOL) file parser. This can read and write the entirety of a SDF file. Note that this does not infer any missing info, e.g. stereochemistries etc. This just preserves the data that comes from the original SDF file.

To add this, I have

chryswoods commented 2 years ago

I am working on some unit tests for this in SireUnitTests. I found that I need to add the new atom properties to the kludge so that they are correctly loaded. This means I am adding a couple more commits (as I need to use these to generate the wrappers). I will let you know when this is ready.

chryswoods commented 2 years ago

Ok - that's done. I had to add in the PackedArray2D wrappers, and then add the new properties to the kludge list in Mol/init.py. I have added a test to SireUnitTests, which I will expand.

jmichel80 commented 2 years ago

Awesome, didn’t know you were working on this. @annamherz and @JenkeScheen may find this feature immediately useful for their projects

lohedges commented 2 years ago

Thanks, @chryswoods. This will really help with OpenForceField integration within BioSimSpace, as well as improving the robustness of our MCS mapping routines. (Both matching atoms using RDKit directly, and generating perturbation networks using LOMAP2.) The additional updates will simplify a lot of the BioSimSpace code too, e.g. selecting molecules by index, for which workarounds already exist.

I'll hop onto your branch locally, rebuild, and test at my end. I'll also make sure that BioSimSpace can be built on top of it.

Cheers.

JenkeScheen commented 2 years ago

Awesome work! An additional thing I can think of that we could use the SDF parser to create the interphase between BSS and RDKit as discussed previously in https://github.com/michellab/BioSimSpace/issues/190

lohedges commented 2 years ago

Yes, at some point I'm hoping to implement some convenience functions in the Molecule class, e.g. toRDKit(), toOpenFF(), toOpenMM(), etc. (Just need to figure out exactly what properties are needed to have a reliable outcome in each case, and what intermediates to use.)

chryswoods commented 2 years ago

Yes, that was my thought too. Although I think it would need to be add to the System class rather than Molecule (or, we could additionally add it to Molecule that automatically creates a one-molecule System, and then calls the function on System).

I am now planning on a new branch to create a new website for Sire using Sphinx. I will be using tutorial-driven development to write a number of Sire tutorials, and will be cleaning up some of the higher level API to make those tutorials easier to write. This will all be backwards-compatible. It will make Sire a lot easier to learn and use.

lohedges commented 2 years ago

Everything is working just fine at my end. The only thing that I've noticed is that, although you can now index molecules in a system by index, e.g. mol = system[-1], and loop over them, for mol in system:, Python slicing doesn't work, e.g. mols = system[3:10].

This is something that I have already implemented in BioSimSpace, so isn't really an issue. I just wondered if it was easy to support slicing in Sire too.

chryswoods commented 2 years ago

Yes - I can add in slicing. I will do that in feat_web as it can be part of my API modernisation as I am writing the new tutorials. This will include some further improvements, e.g. having a string index automatically converting to the appropriate XName class, and adopting "Name:Number", plus "Name[2]" to do more controlled matching.

lohedges commented 2 years ago

That sounds great. I'll merge this across now and work on adding the SDF support into BioSImSpace when I get a chance. (Essentially detecting when a molecule was originally loaded from SDF and bypassing some of the clunky intermeditate conversions when that is the case, e.g. PDB --> RDKit --> SDF, which is currently the most robust for OpenFF.)

Cheers.