ihmwg / IHMCIF

📖 mmCIF support for hybrid/integrative models
https://pdb-dev.wwpdb.org
Creative Commons Zero v1.0 Universal
21 stars 3 forks source link

List all linked out files in a single new table. #2

Closed tomgoddard closed 7 years ago

tomgoddard commented 7 years ago

In order to use IHM format as a working format while a modeling project is in progress, the externally linked files (comparative models, sequence alignments, EM maps, ensembles of result structures, localization maps, ...) should be listed in a table that refers to local files (on the local disk) instead of referencing a DOI zip archive.

One design would give an integer id to every external file, external database reference, external DOI reference, and all other tables would use this id. Any external data could be referenced in any of these 3 ways (local file, database, DOI archive).

Having external references in one table will simplify validating that deposited structures do not reference missing data.

brindakv commented 7 years ago

Currently, all input data is listed in IHM_DATASET_LIST. Data present in other databases are described in IHM_DATASET_RELATED_DB_REFERENCE and those present in other resources are referenced via DOI in IHM_EXTERNAL_REFERENCE_INFO and IHM_DATASET_EXTERNAL_REFERENCE_DETAILS categories.

Localization density is an output from the I/H modeling and relates to the molecular system and the ensemble of good scoring models obtained from the modeling. Localization densities can be included as Gaussian mixture models in IHM_GAUSSIAN_OBJ_ENSEMBLE or can be linked out via IHM_EXTERNAL_REFERENCE_INFO and IHM_LOCALIZATION_DENSITY_FILES categories.

Sequence alignments corresponding to the starting structural models can be included via IHM_EXTERNAL_REFERENCE_INFO and IHM_STARTING_MODEL_ALIGNMENT_FILES categories. We will also define categories to capture the alignments internally. This requires coordination with the PMP/Model Archive developers so that we are consistent with their definitions as well.

Modeling scripts, associated documentation and other workflow-related files can be referred to via IHM_EXTERNAL_REFERENCE_INFO and IHM_MODELING_WORKFLOW_FILES categories.

Wherever possible, we provide ways to incorporate data directly into the I/H model file and also provide the option to link out to external files in standard formats (eg: starting model coordinates, localization densities, alignments). The details of the linked out files are split into the different categories mentioned above because of the differences in the nature of the data that is stored in them. If these files and the data stored within are to be useful, they are best described in separate tables. The IHM_EXTERNAL_REFERENCE_INFO table provides a way to collect the non-database external references is one place.

tomgoddard commented 7 years ago

How does an IHM file reference a local file currently? For example a sequence alignment file, or localization density local file? How is a reference to a sequence alignment file using a URL done?

brindakv commented 7 years ago

Reference to local file has been added.

brindakv commented 7 years ago

A new category (IHM_EXTERNAL_FILES) has been added to reference all external files.