ihmwg / python-ihm

Python package for handling IHM mmCIF and BinaryCIF files
MIT License
14 stars 7 forks source link

entity_poly.pdbx_seq_one_letter_code_can/ entity_poly.nstd_monomer of non-standard polymer #69

Closed bienchen closed 2 years ago

bienchen commented 2 years ago

Hello,

I came across the "inhibitor UAW241" of PDB entry 6XA4/ SMTL entry 6xa4.1. It's a tiny peptide with an acetyl group and UXS added to it. In the entity_poly category, the non-canonical sequences renders nicely, but the canonical one does not. It just concatenates the three letter codes of the extra compounds to the one letter code of the leucines. In the mmCIF file of the PDB entry, the extra compounds are marked as 'X'. Also entity_poly.nstd_monomer is "yes" in the PDB mmCIF file.

I made a little script to show what I mean:

import tempfile
import ihm
import ihm.dumper

system = ihm.System()

ace = ihm.NonPolymerChemComp("ACE", name="ACETYL GROUP", formula="C2 H4 O")
uxs = ihm.NonPolymerChemComp(
    "UXS",
    name="(2S)-2-amino-4-(methylsulfanyl)butan-1-ol",
    formula="C5 H13 N O S",
)

pal = ihm.LPeptideAlphabet()

entity_modified_peptide = ihm.Entity(
    (ace, pal["L"], pal["L"], uxs), description="inhibitor UAW241"
)

system.entities.extend((entity_modified_peptide,))

with tempfile.TemporaryFile(mode="w+", encoding="utf8") as fp:
    ihm.dumper.write(fp, [system])
    fp.seek(0)
    for line in fp:
        if line.startswith("_entity_poly."):
            print(line.strip())
        elif line.startswith("1 polypeptide(L)"):
            print(line.strip())
            print("                    ↑                  ↑    ↑")
            print("             should be 'yes'    should be 'X' (XLLX)")
benmwebb commented 2 years ago

Ah, you're trying to add a non-standard or modified residue to an entity containing amino acids. I don't think anyone's ever done that before with python-ihm (our models only ever contain standard residues). This should be easy to do though - I'll add a fix later today.

I'm a little surprised that it's allowed by PDBx to have something with chem_comp.type = non-polymer show up in the entity_poly table, but you're right, this does seem to be the way it's done for regular PDB files. @brindakv, is our interpretation correct here?

brindakv commented 2 years ago

If a non-standard residue in a polymer has the standard peptide linkage group, it would be classified as L-peptide linking in _chem_comp.type. And if a parent standard residue can be identified, it would have the one-letter code of the parent residue in _entity_poly.pdbx_seq_one_letter_code_can.

But in case of component UXS in entry 6XA4 , it is not a standard peptide linkage with OH leaving group. The leaving group is an aldehyde. The component is therefore classified as a non-polymer.

bienchen commented 2 years ago

In homology modelling you may sometimes copy a ligand from the template, so everything that can be found in the PDB is relevant. I tested the fix, behaviour now looks like in PDB, I see X's in the canonical sequence. Thanks!