openkinome / kinoml

Structure-informed machine learning for kinase modeling
https://openkinome.org/kinoml/
MIT License
52 stars 21 forks source link

Multiprocessing ligand attributes lost #97

Closed schallerdavid closed 2 years ago

schallerdavid commented 2 years ago

Currently, I am experimenting with providing a separate docking template to the OEPositDockingFeaturizer in the posit_template branch. This way you could dock into a structure from PDB entry 4aoj but bias the Posit docking algorithm by the ligand co-crystallized in another PDB entry, e.g. 4yne.

I pass the required information as attributes to the corresponding ligand and protein instances of the protein-ligand complex (see code example below). The Featurizer should be able to read these attributes to do a proper job. This process always worked out fine when passing the attributes to protein only in other Featurizers. However, adding additional attributes to the ligand instance gives surprising results when using multiprocessing. Anything else, but the smiles and name attributes (given during initialization) are lost.

from kinoml.core.components import BaseProtein
from kinoml.core.ligands import Ligand
from kinoml.core.systems import ProteinLigandComplex
from kinoml.features.complexes import OEPositDockingFeaturizer

compounds = {
    "larotrectinib": "C1CC(N(C1)C2=NC3=C(C=NN3C=C2)NC(=O)N4CCC(C4)O)C5=C(C=CC(=C5)F)F",
    "selitrectinib": "CC1CCC2=C(C=C(C=N2)F)C3CCCN3C4=NC5=C(C=NN5C=C4)C(=O)N1"
}

systems = []
for name, smiles in compounds.items():
    protein = BaseProtein(name="NTRK1")
    protein.pdb_id = "4aoj"
    protein.expo_id = "V4Z"
    protein.chain_id = "A"
    ligand = Ligand.from_smiles(smiles=smiles, name=name)
    ligand.docking_template_pdb_id = "4yne"  # lost in multiprocessing
    ligand.docking_template_expo_id = "4EK"  # lost in multiprocessing
    ligand.docking_template_chain_id = "A"  # lost in multiprocessing
    systems.append(ProteinLigandComplex(components=[protein, ligand]))

featurizer = OEPositDockingFeaturizer(output_dir="posit", use_multiprocessing=True)

systems = featurizer.featurize(systems)

Just googling this behavior gave me a few hints. It looks like, there may be a serialization problem.

Interestingly, this is not a problem when using the RDKitLigand class instead of the Ligand class to store the attributes. Since the Ligand class is based on the _OpenForceFieldMolecule class, the problem may arise on their end.

schallerdavid commented 2 years ago