Allows for non-polymer atoms in Entity declaration

mtrellet commented 6 years ago

HADDOCK allows having ligands (non-polymer) atoms in the docking system. Those ligands constitute sometimes a complete subunit to be docked (protein-ligand docking). When trying to automate the creation of an IHM mmCIF file from a PDB file, we encountered the following error:

Traceback (most recent call last):
  File "/Users/mtrellet/Dev/haddock-webserver-flask/app/main/views.py", line 2105, in download_cif
    create_ihm_mmcif(pdb_path, params, interface_ranges)
  File "/Users/mtrellet/Dev/haddock-webserver-flask/utils/PDBHandler.py", line 405, in create_ihm_mmcif
    entities.append(ihm.Entity(seqs[ch_id]['seq'], alphabet=alphabet, description=f'Partner {i+1}'))
  File "/Users/mtrellet/Dev/miniconda3/envs/haddock3/lib/python3.6/site-packages/ihm/__init__.py", line 754, in __init__
    self.sequence = tuple(get_chem_comp(s) for s in sequence)
  File "/Users/mtrellet/Dev/miniconda3/envs/haddock3/lib/python3.6/site-packages/ihm/__init__.py", line 754, in <genexpr>
    self.sequence = tuple(get_chem_comp(s) for s in sequence)
  File "/Users/mtrellet/Dev/miniconda3/envs/haddock3/lib/python3.6/site-packages/ihm/__init__.py", line 753, in get_chem_comp
    return alphabet._comps[s]
KeyError: None

The code that returns the error (I voluntarily skipped some parts, tell me if you need more information):

# We store sequences and match between author resids and mmCIF consecutive resids (1->n)
    for r in s.get_residues():
        if r.get_parent().id in partners and r.resname != "HOH":
            segid = r.get_parent().id
            if segid not in seqs_id:
                seqs_id[segid] = 1
            else:
                seqs_id[segid] += 1
            if segid not in seqs:
                seqs[segid] = {'seq': [], 'id_from_pdb': {}, 'id_from_seq': {}}

            seqs[segid]['id_from_pdb'][seqs_id[segid]] = r.id[1]
            seqs[segid]['id_from_seq'][r.id[1]] = seqs_id[segid]
            # We only store one-letter code for standard aa
            try:
                seqs[segid]['seq'].append(three_to_one.get(r.resname.strip()))
            except KeyError as e:
                seqs[segid]['seq'].append(r.resname.strip())
            except Exception as e:
                logging.exception(f"An issue occured when trying to add residue {r}")
                raise Exception
            # seqs[segid].append(r.resname.strip())

    # Create different entities with specific dictionary depending on the unit type (Protein, RNA, DNA)
    entities = []
    for i, ch_id in enumerate(partners):
        if partners[ch_id]['moleculetype'] == 'DNA':
            alphabet = ihm.DNAAlphabet
        elif partners[ch_id]['moleculetype'] == 'RNA':
            alphabet = ihm.RNAAlphabet
        else:
            alphabet = ihm.LPeptideAlphabet
        entities.append(ihm.Entity(seqs[ch_id]['seq'], alphabet=alphabet, description=f'Partner {i+1}'))
    system.entities.extend(tuple(entities))

One workaround could be to create a non-polymer alphabet but I have the feeling that it would just create errors further down the pipeline. So a solution would be, I guess, to support non-polymeric entities. However I'm well aware that this requires some significant efforts..!

benmwebb commented 6 years ago

If you like, you could certainly make your own Alphabet subclass for ligands. But you're supposed to pass ChemComp objects for each ligand directly to the Entity constructor. That being said, you'll run into problems later on (since the dumpers assume all entities are polymeric) but I'll work on that and update this issue accordingly.

benmwebb commented 6 years ago

The library should now have basic support for writing out structures containing nonpolymers, although there are still a few loose ends. See the ligands_water.py example, and open issues for any further problems you run into.

ihmwg / python-ihm

Allows for non-polymer atoms in Entity declaration #21