salilab / pmi

Python Modeling Interface
https://integrativemodeling.org/nightly/doc/ref/namespaceIMP_1_1pmi.html
11 stars 11 forks source link

Unable to read PDB 7KOE #261

Closed ajtritt closed 8 months ago

ajtritt commented 8 months ago

I'm trying to load the EtfA chain from this PDB structure.

Here is the code I'm using.

import IMP
import IMP.pmi as pmi
import IMP.pmi.topology as topo
import IMP.pmi.dof as dof
import IMP.pmi.restraints.stereochemistry as sc

fasta_path = "rcsb_pdb_7KOE.fasta"
pdb_path = "7koe.pdb"

mdl = IMP.Model()
sys = topo.System(mdl)
cryoEM_st = sys.create_state()

# load once so we can make a map to avoid having to use a huge key when we index below
sequences = topo.Sequences(fasta_path)
name_map = {k:v for k, v in zip(sequences.sequences.keys(), ['A', 'B', 'C', 'D'])}
# and then reload so we can use simpler keys
sequences = topo.Sequences(fasta_path, name_map=name_map)

mol_etfA = cryoEM_st.create_molecule("EtfA", chain_id="A", sequence=sequences['A'][10:292])
mol_etfA.add_structure(pdb_path, chain_id='A')

rcsb_pdb_7KOE.fasta can be downloaded by following this link. 7koe.pdb can be downloaded by following this link.

I was able to get the above code to run free of errors by changing this line: https://github.com/salilab/pmi/blob/75e97da738e7e11056daf345f057f8ba70482b62/pyext/src/topology/__init__.py#L601 to the following:

        raw_idx = pdb_idx

I would gladly submit a pull request to fix this, but I suspect that is not the right fix, since that line of code has been around for 5 years. I suspect I'm doing something wrong, but I can't tell what. Any feedback is much appreciated.

benmwebb commented 8 months ago

Chain A in 7koe starts from 0, not from 1, so you need to apply an offset to make it match PMI's numbering (which always starts from 1, following the FASTA sequence). See https://integrativemodeling.org/2.19.0/doc/ref/classIMP_1_1pmi_1_1topology_1_1Molecule.html#a14b2630b80cd3813acc2c172c55fe08a. In your case, you'll want to change the last line to

mol_etfA.add_structure(pdb_path, chain_id='A', offset=1)