rdk / p2rank

P2Rank: Protein-ligand binding site prediction from protein structure based on machine learning.
MIT License
251 stars 34 forks source link

Issue with loading mmCIF files created in PyMOL #54

Closed davidhoksza closed 4 months ago

davidhoksza commented 2 years ago

If a structure (for example 1tqn) is fetched in PyMOL, exported to mmCIF and then imported to P2Rank then it seems to be failing.

rdk commented 2 years ago

Prediction on mmCIF files saved by PyMOL fails with BioJava parsing error org.rcsb.cif.EmptyColumnException: column auth_seq_id is undefined. The problem is that PyMOL does not include column _atom_site.auth_seq_id even if the original file contained it.

auth_seq_id is optional according to the mmCIF spec but present in all PDB entries https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_atom_site.auth_seq_id.html .

The fact that BioJava fails to parse such files is a known long standing issue https://github.com/biojava/biojava/issues/775 (currently moved to milestone version 6.1.0 but it was moved to the next version many times over last few years).

Details: Current BioJava version used by P2Rank: 6.0.5 (latest) Tested with PyMOL 2.3.4 (problem also reported with PyMOL 1.8.x and 2.5.x)

rdk commented 2 years ago

I have asked about the current status on https://github.com/biojava/biojava/issues/775

rdk commented 1 year ago

In the meantime this has been solved on develop branch by using custom fork of biojava (biojava-structure-6.1.0-rdk1).

rdk commented 4 months ago

Fixed in 2.4.2 release. Aslo in the meantime fixed in BioJava 7.1