ihmwg / python-modelcif

Python package for handling ModelCIF mmCIF and BinaryCIF files
MIT License
10 stars 1 forks source link

Allow compounds not available in the wwPDB components dictionary #21

Closed bienchen closed 2 years ago

bienchen commented 2 years ago

Hello,

in ModelArchive (MA) we may see novel compounds/ ligands in the future. Those compounds are not necessarily stored in the chemical components dictionary (CCD) of wwPDB. Some of them may be so artificial that they can not be considered to be stored in the wwPDB CCD. To still make novel compounds available in MA, two approaches should be established: Let MA have its own CCD and let ModelCIF file define their own compounds locally, if needed.

That means in the future we should have three kinds of sources for chemical components in ModelCIF files: wwPDB CCD, MA CCD, locally defined. This is facilitated by a new item to the _chem_comp category - _chem_comp.ma_provenance.

In case, ma_provenance is "CCD local", a new data category must be populated with data - ma_chem_comp_descriptor, linked back to _chem_comp via _ma_chem_comp_descriptor.chem_comp_id.

Having that scheme to introduce own compounds to ModelCIF files is a feature we need available for ModelArchive.

After looking into the code a bit, I think maybe having classes inheriting from ihm.ChemComp available for ma_provenance "CCD local" and "CCD MA" would be an idea. Then for the ModelCIF file _chem_comp.ma_provenance could be set depending on the class of the compound or the availability of a certain attribute. Adding _ma_chem_comp_descriptor seems to be not complicated, but is there a way that having a "CCD local" compound enforces having _ma_chem_comp_descriptor?

Thanks,

B

benmwebb commented 2 years ago

We do something similar in python-ihm for cross-linkers, which are often not in CCD either - the ihm.ChemDescriptor class. I think inheriting from ihm.ChemComp is potentially problematic as the ChemComp class hierarchy is already used to distinguish DNA/RNA/L-peptide/D-peptide/water/non-polymer. So you'd have to add a mixin class or assume that such custom components are always a single type, e.g. non-polymers.

A solution that avoids that problem would be to just have ChemComp take an extra descriptors argument, a list of Descriptor objects. python-ihm wouldn't define any (since the dictionary doesn't support that) but python-modelcif could. Then you'd just write out chem_comp.ma_provenance "CCD local" if descriptors is non-empty. (This would also allow multiple descriptors, e.g. inchi plus smiles.) Something like

class ChemComp(object):
    def __init__(self, id, code, code_canonical, name=None, formula=None, descriptors=None):
...
class Descriptor(object):
    pass

class InChIKeyDescriptor(Descriptor):
    type = "InChI Key"
    def __init__(self, value, details=None, software=None):
         ...

delamanid = ihm.NonPolymerChemComp(
    id=..., name="Delamanid",
    descriptors=[modelcif.InChIKeyDescriptor(value="XDAOLTSRNUSPPH-XMMPIXPASA-N")])
bienchen commented 2 years ago

Looks like a good idea to me. Simply keep all the unknown ligand's info nicely together. If we manage to find out what to do about PDB format CONECT records, I guess they could be handled in a similar way.

bienchen commented 2 years ago

I tested the new feature and it works as expected. Compounds get marked as "local" and get annotated their list of descriptors. Thanks a lot.