ihmwg / python-ihm

Python package for handling IHM mmCIF and BinaryCIF files
MIT License
14 stars 7 forks source link

Handle duplicate save frames #133

Closed benmwebb closed 5 months ago

benmwebb commented 6 months ago

The newly-merged IHMCIF dictionary (https://github.com/ihmwg/IHM-dictionary/blob/IHMCIF/dist/mmcif_ihm.dic) contains at least one duplicate save frame (see duplicated save__entity_poly_seq.entity_id, around line 23957 in that file). This is considered OK, and PDB tools will merge the information from both frames (rather than taking just the first or last frame, for example). Make sure that our Dictionary class works the same way, and add a unit test.

benmwebb commented 5 months ago

python-ihm currently ignores the names of save frames, and just uses information inside the frames themselves (e.g. item.name, item.category_id). It will ignore the duplicate frame in this case, as it only includes item_sub_category.id, which we don't use. So probably not worth worrying about this for now. In future, if this becomes an issue, we could parse the save frame name for the category/data item, or maintain a mapping from existing data items to save frame names.