schrodinger / pymol-open-source

Open-source foundation of the user-sponsored PyMOL molecular visualization system.
https://pymol.org/
Other
1.15k stars 275 forks source link

Unobserved residues incorrectly handled when cif_use_auth is on (PyMOL 2.5.0 Open-Source) #303

Open Stefano-Trapani opened 1 year ago

Stefano-Trapani commented 1 year ago

Problem with some residues labelled as "unobserved" in some PDB entries (cif files). Some "unobserved" residues listed in some cif files disappear from the sequence and from the (pseudo)atom count when cif_use_auth is on. This was observed using PyMOL 2.5.0 Open-Source. See below an example with PDB entry 1a2c. The problem does not show up systematically with all PDB entries. I have checked 2581 PDB entries, only 19 of them were problematic (see attached lists).

PyMOL>set cif_use_auth, off PyMOL>fetch 1a2c, 1a2clabel, type=cif, async=0 PyMOL>select 1a2c_label Selector: selection "sele" defined with 2638 atoms.

PyMOL>set cif_use_auth, on PyMOL>fetch 1a2c, 1a2cauth, type=cif, async=0 PyMOL>select 1a2c_auth Selector: selection "sele" defined with 2633 atoms.

PyMOL>align 1a2c_label and polymer.protein and n. CA, 1a2c_auth, object=ali Match: read scoring matrix. Match: assigning 309 x 483 pairwise scores. MatchAlign: aligning residues (309 vs 483)... MatchAlign: score 1630.000 ExecutiveAlign: 302 atoms aligned. Executive: RMSD = 0.000 (298 to 298 atoms) Executive: object "ali" created. PyMOL>set seq_view, on

Capture d’écran 2023-08-09 à 10 18 49

# # Excerpt from 1a2c.cif: # loop_ _pdbx_unobs_or_zero_occ_residues.id _pdbx_unobs_or_zero_occ_residues.PDB_model_num _pdbx_unobs_or_zero_occ_residues.polymer_flag _pdbx_unobs_or_zero_occ_residues.occupancy_flag _pdbx_unobs_or_zero_occ_residues.auth_asym_id _pdbx_unobs_or_zero_occ_residues.auth_comp_id _pdbx_unobs_or_zero_occ_residues.auth_seq_id _pdbx_unobs_or_zero_occ_residues.PDB_ins_code _pdbx_unobs_or_zero_occ_residues.label_asym_id _pdbx_unobs_or_zero_occ_residues.label_comp_id _pdbx_unobs_or_zero_occ_residues.label_seq_id 1 1 Y 1 H TRP 147 A B TRP 148 2 1 Y 1 H THR 147 B B THR 149 3 1 Y 1 H ALA 147 C B ALA 150 4 1 Y 1 H ASN 147 D B ASN 151 5 1 Y 1 H VAL 147 E B VAL 152 6 1 Y 1 H GLY 147 F B GLY 153 7 1 Y 1 H LYS 147 G B LYS 154 8 1 Y 1 I ASN 353 ? C ASN 1
9 1 Y 1 I GLY 354 ? C GLY 2
#

checked_PDB_entries.txt problematic_PDB_entries.txt

Stefano-Trapani commented 1 year ago

It seems that the problem arises when the IDs of one or more unobserved residues contain insertion codes.

speleo3 commented 11 months ago

Good find. Would be great to find a solution for this.