Closed hongsudt closed 3 hours ago
Make entry.Workflow_Status
visible on Accession_Code
table.
Based on the discussion on April-19-2024:
Add database_2
table to PDB with the following attributes (data will be populated manually during curation by looking up the information in Accession_Code
)
database_2
(config file?)In the database_2
table in the system generated mmCIF file, lowercase of the PDB_Code
should be used in the DOI string.
@brindakv
System' object has no attribute 'databases'\n
is an indication of a python-ihm version without database2 support. i was able to reproduce it locally. @svoinea can you please check that pythom-ihm
on the host is 1.1 and it's mounted as we discussed here: https://github.com/informatics-isi-edu/protein-database/issues/163#issuecomment-2008132719
@aozalevsky @brindakv Installed version 1.1
:
root@docker-pdbdev-validation-o3558551:/etc/cron.daily# pip3 show ihm
Name: ihm
Version: 1.1
Summary: Package for handling IHM mmCIF and BinaryCIF files
Home-page: https://github.com/ihmwg/python-ihm
Author: Ben Webb
Author-email: ben@salilab.org
License: UNKNOWN
Location: /usr/local/lib/python3.8/dist-packages
Requires: msgpack
Required-by:
Rerun for Accession_Code = TEST-9A8L
.
Got the error:
ERROR IN REPORT VALIDATION.
stdoutdata: b''
stderrdata: b'INFO:root:Current operational mode is: PRODUCTION\nINFO:root:Clean up and create output
directories\nINFO:root:Directory /ihmv/output/TEST-9A8L created \nINFO:root:Directory /ihmv/output/TEST-9A8L/
TEST-9A8L created \nINFO:root:Directory /ihmv/output/TEST-9A8L/TEST-9A8L/htmls created \nINFO:root:Directory /ihmv/
output/TEST-9A8L/TEST-9A8L/images created \nINFO:root:Directory /ihmv/output/TEST-9A8L/TEST-9A8L/csv created
\nINFO:root:Directory /ihmv/output/TEST-9A8L/TEST-9A8L/pdf created
\nWARNING:selenium.webdriver.common.selenium_manager:The geckodriver version (0.33.0) detected in PATH at /opt/
conda/bin/geckodriver might not be compatible with the detected firefox version (121.0); currently, geckodriver 0.34.0 is
recommended for firefox 121.*, so it is advised to delete the driver in PATH and retry\nINFO:root:Entry
composition\nTraceback (most recent call last):\n File "/opt/IHMValidation/ihm_validation/ihm_validator.py", line 282, in
<module>\n template_dict = report.run_entry_composition(Template_Dict)\n File "/opt/IHMValidation/ihm_validation/
report.py", line 85, in run_entry_composition\n Template_Dict[\'ranked_id_list\'] = self.input.get_ranked_id_list()\n
File "/opt/IHMValidation/ihm_validation/mmcif_io.py", line 153, in get_ranked_id_list\n if pdbdev_id is not None:
\nNameError: name \'pdbdev_id\' is not defined. Did you mean: \'pdb_dev_id\'?\n'
@svoinea thanks. that's better. Does this instance (or rather test mode) use the same IHMValidation (/mnt/vdb1/dev_pdbihm/IHMValidation
) version? If yes, can you try running it once more?
I pushed an update (https://github.com/salilab/IHMValidation/releases/tag/20240528) yesterday morning. The server picked it up overnight, so it should work now.
@aozalevsky The dev
instance is using origin/dev_2.0
.
I am not sure if that is the same with 20240528
:
$ git log -1
commit 07d9af171014a86f982090a30597ecdc30f99315 (HEAD, tag: 20240528, origin/dev_2.0)
Author: Arthur Zalevsky <aozalevsky@gmail.com>
Date: Tue May 28 10:02:52 2024 -0700
eliminate redundant duplicating IDs
The result is the same.
@svoinea got it.
It was a typo in the var name in a specific scenario (database_2 is present, but none of the IDs match the entry_id; I didn't have a test case for this if). Anyway, I've pushed the fix https://github.com/salilab/IHMValidation/commit/7d4c10a5fa1fb9dd76aed97f53352445283388e0. I've pulled it on dev instance and was able to generate the report manually. Can you please test it again?
@aozalevsky I have tested and it works now. Thanks.
@svoinea @brindakv Brinda asked me to put a note, that entry_id
should match at least one of the ids in the database_2
.
Here is the content of the test file:
data_TEST-9A8L
#
_entry.id TEST-9A8L
<...>
#
_pdbx_database_status.status_code HOLD
_pdbx_database_status.entry_id TEST-9A8L
_pdbx_database_status.deposit_site ?
_pdbx_database_status.process_site RCSB
_pdbx_database_status.recvd_initial_deposition_date 2024-05-25
#
loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB 9A8L pdb_00009a8l 10.2210/pdb9a8l/pdb
PDB-Dev PDBDEV_00000385 PDBDEV_00000385 ?
@svoinea In pdb
mode, the value of _database_2.database_code
when _database_2.database_id
= PDB-Dev
is incorrect in the system generated mmCIF file.
This is what it is currently:
loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB TEST-XXXX test-pdb_0000xxxx 10.2210/pdbtest-xxxx/pdb
PDB-Dev TEST-XXXX TEST-PDBDEV_00000NNN ?
This is incorrect. It should be:
loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB TEST-XXXX test-pdb_0000xxxx 10.2210/pdbtest-xxxx/pdb
PDB-Dev TEST-PDBDEV_00000NNN TEST-PDBDEV_00000NNN ?
@svoinea can _pdbx_database_status.deposit_site
also be set to RCSB
instead of ?
I have done the required updates and redeployed the backend.
@svoinea The generated mmCIF file is still incorrect.
_pdbx_database_status.status_code HOLD
_pdbx_database_status.entry_id TEST-9A8N
_pdbx_database_status.deposit_site ?
_pdbx_database_status.process_site RCSB
_pdbx_database_status.recvd_initial_deposition_date 2024-06-05
#
loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB TEST-9A8N test-pdb_00009a8n 10.2210/pdbtest-9a8n/pdb
PDB-Dev TEST-PDBDEV_00000387 TEST-PDBDEV_00000387 RCSB
_pdbx_database_status.deposit_site
should be set to RCSB
and _database_2.pdbx_DOI
should be ?
when _database_2.database_id
== PDB-Dev
.
The correct data should be:
_pdbx_database_status.status_code HOLD
_pdbx_database_status.entry_id TEST-9A8N
_pdbx_database_status.deposit_site RCSB
_pdbx_database_status.process_site RCSB
_pdbx_database_status.recvd_initial_deposition_date 2024-06-05
#
loop_
_database_2.database_id
_database_2.database_code
_database_2.pdbx_database_accession
_database_2.pdbx_DOI
PDB TEST-9A8N test-pdb_00009a8n 10.2210/pdbtest-9a8n/pdb
PDB-Dev TEST-PDBDEV_00000387 TEST-PDBDEV_00000387 ?
I did the updates and redeployed.
@svoinea I updated the description in the main issue to include the following:
get_database_2_string(mode)
function, check the corresponding missing_accession_code
exception. Please note that I changed the logic for the PDB mode, to use <pdb accession code>
instead of <pdb code>
. This way, it will respond to whatever PDB policy for accession_code.. missing_accession_code
exception. For the primary mode, throw an error. For the alternate mode, ignore. get_database_2_string
function. The model for the Accession_Code
table was defined as NOT NULL
for the PDBDEV_Accession_Code
, PDB_Extended_Code
, PDB_Code
and PDB_Accession_Code
columns.
pdbdev
mode works as expected on dev
.
Support of multiple Accession_Code in deriva
PDB:Accession_Code
Model change
Entry
to nullok=TruePDBDEV_Accession_Code
(type = text, nullok=False, comment="Accession code issued by PDB-Dev, if availabe") and copy the current Accession_Code column overPDB_Accession_Code
(type = text, nullok=False, comment="Accession code issued by PDB, if available"). This column dictates what PDB Accession Code is supposed to be. For now based on PDB policy, it is the PDB_Code, but will be PDB_Extended_Code in the future.PDB_Code
(type = text, nullok=False, comment="...4 digit code.")PDB_Extended_Code
(type = text, nullok=False, comment="Extended the accession code issued by PDB, if available"). This will get filled in later by an ingest code.Notes
(type = text, nullok=True, comment="Additional details about the accession codes"). This is for curator notes.Ingest script
Triggers:
old.Entry
has to beNULL
. If not, throw an error (RAISE stmt
).. Something is wrong since the same number is reassigned.Annotations
PDB_Accession_Code
Backend Processing
Entry = Null
and smallest RID e.g.select * from PDB.Accession_Code where Entry is null order by RID
ACL
PDB:database_2
should be visible to depositor but not editable (at all stages of the workflow). It should be editable by curator.Vocab:database_2_database_id
should have the same ACL as all other vocab tablesSupport of PDB and PDB-Dev accession codes
Pipeline changes
Update mmCIF generation process after accession codes are issued (
Submission_Complete
-->HOLD
andRelease_Ready
-->REL
)pdbx_database_status
,pdbx_audit_revision_history
, andpdbx_audit_revision_details
are added to the system generated mmCIF file, add a fourth table,database_2
._database_2.database_id
=PDB-Dev
_database_2.database_code
and_database_2.pdbx_database_accession
=entry.Accession_Code
_database_2.pdbx_DOI
=?
database_2
table duringSubmission_Complete
-->HOLD
database_2
table as is duringRelease_Ready
-->REL
(it does not require to be updated in this step; it needs to be populated with the same information as the previous step)Note: when we have access to the changes made to the Accession_Code table, apply these rules when generating the _database_2 rows above:
get_primary_accession_code(mode=PDB|PDBDEV)
.get_database_2_string(mode)
) that generate a database_2 string based on two modes (PDB
andPDBDEV
)PDB
:"PDB <PDB_Accession_Code> <PDB_Extended_Code> 10.2210/pdb<PDB_Code>/pdb"
<- lowerPDBDEV
:"PDB-Dev <PDBDEV_Accession_Code> <PDBDEV_Accession_Code> ?"
missing_accession_code
exceptionprimary_accession_code_mode
(defaultPDBDEV
) andalternative_accession_code_mode
(defaultPDB
):get_database_2_string
with the primary_accession_code_mode, then alternative_accession_code_mode.