informatics-isi-edu / pdb-ihm

Deriva Protein Database Project
2 stars 1 forks source link

Verify mmCIF file creation procedure #107

Open brindakv opened 2 years ago

brindakv commented 2 years ago

Not all tables from the uploaded mmCIF file are populated in the database. Verify if the additional tables (those that are not in the DB) are retained in the mmCIF file during mmCIF generation (SUBMISSION COMPLETE --> mmCIF CREATED and RELEASE READY --> REL).

brindakv commented 2 years ago

The exported mmCIF file is created as follows:

brindakv commented 2 years ago

Requirement: Add all tables beginning with flr_ from the original mmCIF file. @brindakv to double check if updating ihm_ tables through chaise will affect mmCIF validation when flr_ tables from the original mmCIF file are copied.

brindakv commented 1 year ago

The ihm_ data items that the flr_ data items point to:

Source of mapping from flr_ to ihm_ is from the FLR-dictionary: https://github.com/ihmwg/FLR-dictionary

FLR-dictionary tables mapping to IHM-dictionary:

_flr_probe_descriptor.reactive_probe_chem_descriptor_id --> _ihm_chemical_component_descriptor.id
_flr_probe_descriptor.chromophore_chem_descriptor_id --> _ihm_chemical_component_descriptor.id
_flr_poly_probe_position_modified.chem_descriptor_id --> _ihm_chemical_component_descriptor.id
_flr_poly_probe_conjugate.chem_descriptor_id --> _ihm_chemical_component_descriptor.id
_flr_fret_analysis.dataset_list_id --> _ihm_dataset_list.id
_flr_fret_analysis.external_file_id --> _ihm_external_files.id
_flr_fret_distance_restraint.state_id --> _ihm_multi_state_modeling.state_id
_flr_fret_model_distance.model_id --> _ihm_model_list.model_id
_flr_fret_model_quality.model_id --> _ihm_model_list.model_id
_flr_fret_model_quality.dataset_group_id --> _ihm_dataset_group.id
_flr_kinetic_scheme_state.state_id --> _ihm_multi_state_modeling.state_id
_flr_kinetic_scheme_connectivity.start_state_id --> _ihm_multi_state_modeling.state_id
_flr_kinetic_scheme_connectivity.end_state_id --> _ihm_multi_state_modeling.state_id
_flr_kinetic_rate_scheme.external_file_id --> _ihm_external_files.id
_flr_relaxation_time_kinetic_scheme.external_file_id --> _ihm_external_files.id
brindakv commented 1 year ago

In the current pipeline, we combine data from ermrest and from the uploaded mmCIF file (atomic coordinates) and handle the same issue (of data from mmCIF file pointing to tables in ermrest) by restricting the user from editing specific tables (e.g. atom_type, ihm_model_list etc.).

We are currently not supporting flr_ tables in ermrest because there are too many of these.

Therefore, we need a way to take the flr_ data from the original mmCIF for final mmCIF creation and bypass loading this data to ermrest. The issue to address here is what happens if flr_ tables point to current editable tables in ermrest and the user updates these tables inadvertently using chaise?

svoinea commented 1 year ago

@brindakv The above _ihm_*id enumerated items, are they the names of the columns in the flr_ tables?

Another question: I did a test on catalog 99 for the entry with RID=1-X56T with Chemical Descriptors. The Chemical Descriptors has 3 records with id=1, id=2 and id=3. If the _ihm_chemical_component_descriptor.id from the flr_* table has none of those three values, what happens?

brindakv commented 1 year ago

@svoinea The above _ihm_*id data items are from the IHM-dictionary (which are defined in ermrest) and tables / columns in the FLR-dictionary (not defined in ermrest but present in uploaded mmCIF file) point to these data items (keys) in the IHM-dictionary.

flr_ tables are used only when flourescence / FRET data is used for integrative modeling. There are tables in the FLR-dictionary that refer to Chemical Descriptors in the IHM-dictionary. Chemical Descriptors can be used by other kinds of experiments as well (e.g., chemical crosslinking). RID=1-X56T on dev uses Chemical Descriptors for chemical crosslinks and not for fluorescence / FRET. So flr_ tables are absent.

svoinea commented 1 year ago

@brindakv Can you provide a sample with some flr_ data from the original mmCIF?

brindakv commented 1 year ago

@svoinea https://pdb-dev.wwpdb.org/cif/PDBDEV_00000019.cif https://pdb-dev.wwpdb.org/cif/PDBDEV_00000044.cif https://pdb-dev.wwpdb.org/cif/PDBDEV_00000088.cif

svoinea commented 1 year ago

@brindakv Looking at https://pdb-dev.wwpdb.org/cif/PDBDEV_00000019.cif file, I saw at line 45,416 the following lines:

loop_
_flr_experiment.ordinal_id
_flr_experiment.id
_flr_experiment.instrument_id
_flr_experiment.inst_setting_id
_flr_experiment.exp_condition_id
_flr_experiment.sample_id
_flr_experiment.details
1 1 1 1 1 1 .
2 1 1 1 1 2 .
...

With what ermrest table does it get associated? In ermrest, there is no table named *experiment. Should those lines simple be ignored?

At this time, it is unclear for me how you associate the flr_ data with the 6 ermrest tables you mentioned above.

brindakv commented 1 year ago

Requirements:

Update on 11-2-2022: The JSON file created doesn't contain the FLR tables. @brindakv has to update the yml file.