informatics-isi-edu / pdb-ihm

Deriva Protein Database Project
2 stars 1 forks source link

Review how mmCIF schema updates get propogated to system generated mmCIF file #182

Open brindakv opened 8 months ago

brindakv commented 8 months ago

This json file is used to identify tables populated in system generated mmCIF file.

However, ihm_entry_collection is populated using the backend: https://github.com/informatics-isi-edu/protein-database/blob/master/scripts/pdb_processing/worker/pdb/clientlib/pdb_workflow_processing_lib/client.py#L2141

The process needs to be reviewed so that the configuration can be controlled in one place.

Update documentation accordingly: https://github.com/informatics-isi-edu/protein-database/wiki/Creating-mmCIF-files.

brindakv commented 7 months ago

@svoinea to identify all tables that are not in the json config file that are populated in the system generated mmcif file.

Also, identify tables that are not in ermrest but are taken as is from the initial mmCIF (e.g., atom_site): https://github.com/informatics-isi-edu/protein-database/blob/master/scripts/pdb_processing/config/mmcif_tables_input2output.json

svoinea commented 7 months ago
  1. DEPO 1.1 make-mmcif.py(input.cif) --> output.cif 1.2 testSchemaDataPrepValidate-ihm.py(output.cif) --> *.json 1.3 loadTablesFromJSON
  2. SUBMIT 2.1 make-mmcif.py(input.cif) --> output.cif 2.2 write into the <structure_id>.cif file:
    • 2.2.1 write the <structure_id> value
    • 2.2.2 write the data for the tables present in the ermrest_table_defs.json file which is generated from the json-full-db-ihm_dev_full-col-ihm_dev_full.json file
    • 2.2.3 do NOT write data for the chem_comp_atom table
    • 2.2.4 for the table audit_conform write the Supported_Dictionary
    • 2.2.5 from the output.cif file, write the lines of the tables present in the mmcif_tables_input2output.json file
    • 2.2.6 validateExportmmCIF
      • 2.2.6.1 delete existing Entry_Generated_File
      • 2.2.6.2 run CifCheck on the <structure_id>.cif file
        • 2.2.6.2.1 store in hatrac the Entry_Generated_File or the Entry_Error_File
      • 2.2.7 Add in ermrest the Conform_Dictionary entries
  3. SUBMISSION COMPLETE 3.1 Get Accession Code 3.2 addReleaseRecords for HOLD:

    • 3.2.1
    • 3.2.2

    3.3 generate the <Accession_Code>.cif file from the <structure_id>.cif file:

    • 3.3.1 write the Accession_Code value
    • 3.3.2 add pdbx_database_status and pdbx_audit_revision blocks in the system generated mmCIF file
    • 3.3.3 addCollectionRecords from the ihm_entry_collection and ihm_entry_collection_mapping tables
    • 3.3.4 run singularity for the report validation
    • 3.3.5 generate the JSON_mmCIF_content file
  4. RELEASE READY 4.1 addReleaseRecords for REL:
    • 4.1.1
    • 4.1.2
    • 4.1.3
    • 4.1.4 Update pdbx_database_status and pdbx_audit_revision blocks in the system generated mmCIF file