Closed davidhoksza closed 4 months ago
Prediction on mmCIF files saved by PyMOL fails with BioJava parsing error
org.rcsb.cif.EmptyColumnException: column auth_seq_id is undefined
.
The problem is that PyMOL does not include column _atom_site.auth_seq_id
even if the original file contained it.
auth_seq_id
is optional according to the mmCIF spec but present in all PDB entries https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_atom_site.auth_seq_id.html .
The fact that BioJava fails to parse such files is a known long standing issue https://github.com/biojava/biojava/issues/775 (currently moved to milestone version 6.1.0 but it was moved to the next version many times over last few years).
Details: Current BioJava version used by P2Rank: 6.0.5 (latest) Tested with PyMOL 2.3.4 (problem also reported with PyMOL 1.8.x and 2.5.x)
I have asked about the current status on https://github.com/biojava/biojava/issues/775
In the meantime this has been solved on develop branch by using custom fork of biojava (biojava-structure-6.1.0-rdk1).
Fixed in 2.4.2 release. Aslo in the meantime fixed in BioJava 7.1
If a structure (for example 1tqn) is fetched in PyMOL, exported to mmCIF and then imported to P2Rank then it seems to be failing.