ihmwg / IHMCIF

📖 mmCIF support for hybrid/integrative models
https://pdb-dev.wwpdb.org
Creative Commons Zero v1.0 Universal
22 stars 3 forks source link

Add support for ensemble subsamples #80

Closed benmwebb closed 4 years ago

benmwebb commented 4 years ago

In IMP we often construct a model ensemble by combining data from multiple independent runs. This allows us to determine if sampling was sufficient by comparing the runs (if sampling is not complete, each run will sample a different part of the conformational space). Unfortunately the existing ihm_ensemble_info category only allows a single file (ensemble_file_id) to be referenced. We could combine our independent runs and deposit a single DCD file, but that would lose data. We would rather deposit a separate DCD file for each run. This allows the model to be validated by rerunning the sampling convergence test.

An example of where this is currently done is PDB-Dev 37. ensemble_id 1 references external file 4, which is Ensemble_DCD/A_CSN.dcd, but this contains only half of the ensemble structures (subsample A). We also deposited Ensemble_DCD/B_CSN.dcd as external file 79 (subsample B) but this is not referenced from the ensemble table.

Proposal: add a subsample and a subsample-group table. The subsample table contains

The subsample group table contains

Visualization in ChimeraX would likely proceed by adding the subsamples as child nodes of the ensemble and having separate coordsets for each one.

brindakv commented 4 years ago

@benmwebb, please check the latest dictionary update.