informatics-isi-edu / protein-database

Deriva Protein Database Project
2 stars 1 forks source link

incorporate pdb accession code #208

Closed hongsudt closed 3 hours ago

hongsudt commented 2 months ago

Support of multiple Accession_Code in deriva

PDB:Accession_Code

Annotations

Backend Processing

ACL

Support of PDB and PDB-Dev accession codes

Pipeline changes

Update mmCIF generation process after accession codes are issued (Submission_Complete --> HOLD and Release_Ready --> REL)

# here is the _database_2 strings
get_database_2_string(primary_accession_code_mode) -- if missing_accession_code exception is caught, throw an error
If alternate_accession-code_model: 
  get_database_2_string(alternate_accession_code_mode) -- if missing_accession_code exception is caught, don't write to mmCIF.
# deprecate the following method
if PDBDEV_Accession_Code is not NULL, then generate
   "PDB-Dev <PDBDEV_Accession_Cde> <PDBDEV_Accession_Code> ?"
If PDB_Accession_Code is NOT NULL, then generate
  "PDB <PDB_Code> <PDB_Extended_Code> 10.2210/pdb<PDB_Code>/pdb"
brindakv commented 2 months ago

Make entry.Workflow_Status visible on Accession_Code table.

brindakv commented 2 months ago

Based on the discussion on April-19-2024:

Add database_2 table to PDB with the following attributes (data will be populated manually during curation by looking up the information in Accession_Code)

brindakv commented 1 month ago

In the database_2 table in the system generated mmCIF file, lowercase of the PDB_Code should be used in the DOI string.

aozalevsky commented 1 month ago

@brindakv System' object has no attribute 'databases'\n is an indication of a python-ihm version without database2 support. i was able to reproduce it locally. @svoinea can you please check that pythom-ihm on the host is 1.1 and it's mounted as we discussed here: https://github.com/informatics-isi-edu/protein-database/issues/163#issuecomment-2008132719

svoinea commented 1 month ago

@aozalevsky @brindakv Installed version 1.1:

root@docker-pdbdev-validation-o3558551:/etc/cron.daily# pip3 show ihm
Name: ihm
Version: 1.1
Summary: Package for handling IHM mmCIF and BinaryCIF files
Home-page: https://github.com/ihmwg/python-ihm
Author: Ben Webb
Author-email: ben@salilab.org
License: UNKNOWN
Location: /usr/local/lib/python3.8/dist-packages
Requires: msgpack
Required-by: 

Rerun for Accession_Code = TEST-9A8L. Got the error:

ERROR IN REPORT VALIDATION.
stdoutdata: b''
stderrdata: b'INFO:root:Current operational mode is: PRODUCTION\nINFO:root:Clean up and create output 
directories\nINFO:root:Directory /ihmv/output/TEST-9A8L created \nINFO:root:Directory /ihmv/output/TEST-9A8L/
TEST-9A8L created \nINFO:root:Directory /ihmv/output/TEST-9A8L/TEST-9A8L/htmls created \nINFO:root:Directory /ihmv/
output/TEST-9A8L/TEST-9A8L/images created \nINFO:root:Directory /ihmv/output/TEST-9A8L/TEST-9A8L/csv created 
\nINFO:root:Directory /ihmv/output/TEST-9A8L/TEST-9A8L/pdf created 
\nWARNING:selenium.webdriver.common.selenium_manager:The geckodriver version (0.33.0) detected in PATH at /opt/
conda/bin/geckodriver might not be compatible with the detected firefox version (121.0); currently, geckodriver 0.34.0 is 
recommended for firefox 121.*, so it is advised to delete the driver in PATH and retry\nINFO:root:Entry 
composition\nTraceback (most recent call last):\n  File "/opt/IHMValidation/ihm_validation/ihm_validator.py", line 282, in 
<module>\n    template_dict = report.run_entry_composition(Template_Dict)\n  File "/opt/IHMValidation/ihm_validation/
report.py", line 85, in run_entry_composition\n    Template_Dict[\'ranked_id_list\'] = self.input.get_ranked_id_list()\n  
File "/opt/IHMValidation/ihm_validation/mmcif_io.py", line 153, in get_ranked_id_list\n    if pdbdev_id is not None:
\nNameError: name \'pdbdev_id\' is not defined. Did you mean: \'pdb_dev_id\'?\n'
aozalevsky commented 1 month ago

@svoinea thanks. that's better. Does this instance (or rather test mode) use the same IHMValidation (/mnt/vdb1/dev_pdbihm/IHMValidation) version? If yes, can you try running it once more?

I pushed an update (https://github.com/salilab/IHMValidation/releases/tag/20240528) yesterday morning. The server picked it up overnight, so it should work now.

svoinea commented 1 month ago

@aozalevsky The dev instance is using origin/dev_2.0. I am not sure if that is the same with 20240528:

$ git log -1
commit 07d9af171014a86f982090a30597ecdc30f99315 (HEAD, tag: 20240528, origin/dev_2.0)
Author: Arthur Zalevsky <aozalevsky@gmail.com>
Date:   Tue May 28 10:02:52 2024 -0700

    eliminate redundant duplicating IDs

The result is the same.

aozalevsky commented 1 month ago

@svoinea got it.

It was a typo in the var name in a specific scenario (database_2 is present, but none of the IDs match the entry_id; I didn't have a test case for this if). Anyway, I've pushed the fix https://github.com/salilab/IHMValidation/commit/7d4c10a5fa1fb9dd76aed97f53352445283388e0. I've pulled it on dev instance and was able to generate the report manually. Can you please test it again?

svoinea commented 1 month ago

@aozalevsky I have tested and it works now. Thanks.

aozalevsky commented 1 month ago

@svoinea @brindakv Brinda asked me to put a note, that entry_id should match at least one of the ids in the database_2.

Here is the content of the test file:

data_TEST-9A8L

#
_entry.id  TEST-9A8L
<...>
#
_pdbx_database_status.status_code                     HOLD
_pdbx_database_status.entry_id                        TEST-9A8L
_pdbx_database_status.deposit_site                    ?
_pdbx_database_status.process_site                    RCSB
_pdbx_database_status.recvd_initial_deposition_date   2024-05-25 
#
loop_
_database_2.database_id 
_database_2.database_code 
_database_2.pdbx_database_accession 
_database_2.pdbx_DOI 
PDB 9A8L pdb_00009a8l 10.2210/pdb9a8l/pdb
PDB-Dev PDBDEV_00000385 PDBDEV_00000385 ?
brindakv commented 4 weeks ago

@svoinea In pdb mode, the value of _database_2.database_code when _database_2.database_id = PDB-Dev is incorrect in the system generated mmCIF file.

This is what it is currently:

loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB TEST-XXXX test-pdb_0000xxxx 10.2210/pdbtest-xxxx/pdb
PDB-Dev TEST-XXXX TEST-PDBDEV_00000NNN ?

This is incorrect. It should be:

loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB TEST-XXXX test-pdb_0000xxxx 10.2210/pdbtest-xxxx/pdb
PDB-Dev TEST-PDBDEV_00000NNN TEST-PDBDEV_00000NNN ?
brindakv commented 4 weeks ago

@svoinea can _pdbx_database_status.deposit_site also be set to RCSB instead of ?

svoinea commented 4 weeks ago

I have done the required updates and redeployed the backend.

brindakv commented 3 weeks ago

@svoinea The generated mmCIF file is still incorrect.

_pdbx_database_status.status_code                     HOLD
_pdbx_database_status.entry_id                        TEST-9A8N
_pdbx_database_status.deposit_site                    ?
_pdbx_database_status.process_site                    RCSB
_pdbx_database_status.recvd_initial_deposition_date   2024-06-05
#
loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB TEST-9A8N test-pdb_00009a8n 10.2210/pdbtest-9a8n/pdb
PDB-Dev TEST-PDBDEV_00000387 TEST-PDBDEV_00000387 RCSB

_pdbx_database_status.deposit_site should be set to RCSB and _database_2.pdbx_DOI should be ? when _database_2.database_id == PDB-Dev.

The correct data should be:

_pdbx_database_status.status_code                     HOLD
_pdbx_database_status.entry_id                        TEST-9A8N
_pdbx_database_status.deposit_site                    RCSB
_pdbx_database_status.process_site                    RCSB
_pdbx_database_status.recvd_initial_deposition_date   2024-06-05
#
loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB TEST-9A8N test-pdb_00009a8n 10.2210/pdbtest-9a8n/pdb
PDB-Dev TEST-PDBDEV_00000387 TEST-PDBDEV_00000387 ?
svoinea commented 3 weeks ago

I did the updates and redeployed.

brindakv commented 3 weeks ago
hongsudt commented 3 weeks ago

@svoinea I updated the description in the main issue to include the following:

svoinea commented 2 weeks ago

The model for the Accession_Code table was defined as NOT NULL for the PDBDEV_Accession_Code, PDB_Extended_Code, PDB_Code and PDB_Accession_Code columns.

brindakv commented 1 week ago

pdbdev mode works as expected on dev.