pdb/cif entity subchain comparisons

project-gemmi / gemmi

macromolecular crystallography library and utilities

Mozilla Public License 2.0

205 stars 42 forks source link

Hi @wojdyr ,

Been running 8r4q.pdb & 8r4q.cif thru alignment for finding sequence gaps. cif entity names are the _entity.id while for pdb the chain id. So started using the Entity::subchains to find the chain id's

cif gives subchain letters like

entity 1
  subchain A,C,E,G,I,K
entity 2
  subchain B,D,F,H,J,L

while pdb yields

entity A
  subchain Axp
entity B
  subchain Bxp
entity C
  subchain Cxp
  ...

Can I trust single capital first letter and ignore the xp or other postscripts such as x1 x2? Or is there a better parse of these subchain strings?

>>> st = gemmi.read_structure('/data/structures/divided/pdb/r4/pdb8r4q.ent.gz') >>> st.setup_entities() >>> st.entities[0] <gemmi.Entity 'A' polymer polypeptide(L) object at 0x55a73eacce80> >>> _.subchains ['Axp', 'Cxp', 'Exp', 'Gxp', 'Ixp', 'Kxp']

project-gemmi / gemmi

pdb/cif entity subchain comparisons #309