project-gemmi / gemmi

macromolecular crystallography library and utilities
https://project-gemmi.github.io/
Mozilla Public License 2.0
205 stars 42 forks source link

OBSLTE from pdb and cif input #284

Open rimmartin opened 9 months ago

rimmartin commented 9 months ago

Hi @wojdyr and everyone,

Should OBSLTE be added to Structure::info as the cif tag _pdbx_database_PDB_obs_spr or do you think the record(can be multi-line) should be parsed in more detail at the read step?

If Structure::info would the pdb and cif wind up being the same info to interpret by app?

Example: 1qon.pdb & 1qon.cif

Thank you

wojdyr commented 9 months ago

Note that your example has SPRSDE instead of OBSLTE.

I looked at _pdbx_database_PDB_obs_spr in the current PDB database, and unfortunately in a few cases (namely: 3EZB 5HUZ 3J7O 3J7P 3J7Q 3J7R 1NEW 9PAP 2R2X 1SAF) it contains a few entries, for example

loop_
_pdbx_database_PDB_obs_spr.id 
_pdbx_database_PDB_obs_spr.date 
_pdbx_database_PDB_obs_spr.pdb_id 
_pdbx_database_PDB_obs_spr.replace_pdb_id 
_pdbx_database_PDB_obs_spr.details 
SPRSDE 1987-01-15 9PAP 8PAP        ? 
SPRSDE 1986-10-24 9PAP '3PAD 8PAP' ?

although in the corresponding PDB it's only one line

SPRSDE     24-OCT-86 9PAP      3PAD 8PAP

I didn't check how it looks in the obsolete PDB entries.

Out of curiosity, what would you use it for?

rimmartin commented 9 months ago

Ah, the https://www.rcsb.org human interface by default jumps 1qon to its replacement 6xyu.

I'm being asked to provide the info and choice to users before they do a large calculation when we pull from the web service programmatically(it doesn't auto-switch to the more recent). For example a user sets our software to modeling missing loops and refining a number of loops to see if they now fit the density. When there could be a better model in the rcsb that already has the loops. Or better density or resolution.

pdb 1qon

OBSLTE     15-FEB-23 1QON      6XYU                                             

mmcif 1qon

_pdbx_database_PDB_obs_spr.id               OBSLTE 
_pdbx_database_PDB_obs_spr.date             2023-02-15 
_pdbx_database_PDB_obs_spr.pdb_id           6XYU 
_pdbx_database_PDB_obs_spr.replace_pdb_id   1QON 
_pdbx_database_PDB_obs_spr.details          ? 
# 
wojdyr commented 9 months ago

Perhaps the web service that you use can return status of the entry without downloading a file? Or maybe it'd be more convenient to use the list from: https://files.wwpdb.org/pub/pdb/data/status/obsolete.dat

rimmartin commented 9 months ago

Thanks for pointing out obsolete.dat I'm using http://files.rcsb.org/download/ which I think mirrors the wwpdb

They want both ways; it happens when you give them options:-)

If a user already has pdb's what we're trying to do is be nice to their budget and not run unless they know about an updated structure

wojdyr commented 9 months ago

OK, I'm not against it. Are there any examples of multi-line OBSLTE? Or examples when this record has multiple new PDB IDs (in either PDB or mmCIF format)?

rimmartin commented 9 months ago

I'll keep an eye out. Will ask those who run large sets for an example