ihmwg / python-ihm

Python package for handling IHM mmCIF and BinaryCIF files
MIT License
14 stars 7 forks source link

Missing auth_seq_id from _atom_site #60

Closed rvhonorato closed 2 years ago

rvhonorato commented 2 years ago

I am using python-ihm to generate mmcif files based on pdbs generated by docking.

However I noticed that _atom_site.auth_seq_id is missing from the dump atoms function: https://github.com/ihmwg/python-ihm/blob/fb9199d655bef19b66674dc6334ea24ecd68259b/ihm/dumper.py#L1461

Then the generated PDBs are displayed with their seq_id residue numbering instead of the desired user-inputted numbering.

I patched it by adding auth_seq_id to the loop on ihm/dumper.py#L1464 and also to the writing section ihm/dumper.py#L1481 and now I am getting the correct numbering on the .cif file.

Is this actually missing? If so I can do a pull request with my changes, else please let me know if I am not using it correctly.

benmwebb commented 2 years ago

python-ihm does not output auth_seq_id in _atom_site because it is redundant (it is a per-residue property; it's a holdover from ancient PDB to output it for every atom; it's also not required as per https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/atom_site.html). The mapping between different sequence IDs is in the _pdbx_poly_seq_scheme table.

That being said, it sounds like whatever software you're using to look at mmCIF files does not read the _pdbx_poly_seq_scheme table. Ideally that software would be fixed, but I have no objection to having python-ihm emitting auth_seq_id to help it out in the meantime. So I'll merge your PR once it passes the test suite.

rvhonorato commented 2 years ago

Thanks for the explanation! Indeed the _pdbx_poly_seq_scheme is correct but still would not show the numbering on on PyMol 2.4.