ihmwg / python-ihm

Python package for handling IHM mmCIF and BinaryCIF files
MIT License
14 stars 7 forks source link

Reading files with incorrect model numbers #81

Closed brindakv closed 2 years ago

brindakv commented 2 years ago

We recently received a deposition where the uploaded mmCIF file had ten models in the atom_site table. However, the _atom_site.pdbx_PDB_model_num column had the same model number for all ten models i.e., there were multiple sets of coordinates for the same atom in the same model. python-ihm did not throw out an error while reading this mmCIF file.

benmwebb commented 2 years ago

python-ihm tries to be pretty tolerant with its input. Are you still using util/make-mmcif.py in your pipeline? If so, easy to add a check there for duplicate atoms.

brindakv commented 2 years ago

Right. But the dumper also writes out coordinates for multiple models with the same model number.

benmwebb commented 2 years ago

Sure, we could check in the dumper instead, I guess that makes sense. We already check for other wonkiness (e.g. trying to emit atoms for a seq_id that isn't in the representation) so that would be a natural place to do it.