isblab / nestor

Tool for optimizing the multi-scale coarse-grained representation of integrative models
Creative Commons Attribution Share Alike 4.0 International
0 stars 0 forks source link

\brief Nested sampling-based optimization of representation

PubMed

DOI

NestOR: Nested Sampling-based Optimization of Representation for Integrative Structural Modeling

Python module to perform Nested Sampling-based optimization of representation for integrative structural modeling

graphical_abstract_nestor

Publication and Data

Dependencies:

Running NestOR

Inputs

(See also examples/)

  1. We need to split the input restraints into two subsets: one for sampling and one for evidence calculation. We recommend 30% of input crosslinks for the sampling subset, and the rest of the crosslinks and all the EM and other restraints for the evidence calculation subset. Use this helper script to split the input crosslinks into the sampling and evidence calculation subsets: python -m IMP.nestor.xl_datasplitter {path} where, path refers to the path of the target crosslinking file.
  2. Make the modeling script in the form as shown in the examples/modeling.py. One will also need to make separate topology files for different candidate representations.
    • _Make sure that the restraints that are to be used to inform the likelihood have weight=0, and these are added to a separate list that is passed to the replica exchange macro as nestor_restraints argument_.
    • Ensure the modeling script looks similar to the one in example/. Specifically, ensure that the modeling instructions are enclosed in a function that is called so that the terminal stdout of the modeling is not returned to the terminal. One can use contextlib as shown in the example.
  3. Set appropriate parameters in the nestor_params.yaml file.

Run command

Run the NestOR wrapper as follows:

python -m IMP.nestor.wrapper_v6 -p {nestor_param_path}

where, nestor_param_path refers to the absolute path to the nestor_params.yamlfile. If using topology file for representing the system, use -t flag. This flag can be ommitted if the representation is defined in the modeling script. If only the plotting functionalities of NestOR are to be used, run the above command with -s flag.

Note One NestOR run corresponds to the set of all nested sampling runs for all candidate representations._ One can also compare results from NestOR runs with different parameter settings by running python -m IMP.nestor.compare_runs_v2_w_pyplot {comparison_title} run_set1 run_set2 ... where comparison_title is the title for the runs to be compared, run_set1 and run_set2 are the NestOR runs to be compared.

Outputs

Plots

Step 1 in the Run command above, i.e. one NestOR run generates these plots:

  1. Evidence: The plot (*_params_evidence_errorbarplot.png) shows the mean values of evidence for all the candidate representations along with errorbars showing the standard error on the mean.
  2. MCMC per-step time: The plot (*_params_persteptime.png) shows the time required to sample one MCMC step per run. This is computed as (time taken for iteration 0)/((number of initial frames)*(number of MCMC steps per frame))
  3. Evidence and MCMC per-step time per representation : The plot (*sterr_evi_and_proctime.png) compares evidences and their sampling efficiency across representations.

Output YAML file

This file is generated upon completion of step 1 in the Run command above.

Choice of NestOR parameters

Evidence related:

Efficiency related

Termination related

Exit codes:

If a run terminates with exit code = 12, the run is considered incomplete (and is not rerun) and its results are not considered valid, i.e. these are not plotted and not used to infer optimal representation. Results from runs with exit codes 0 and 13 are used to infer the optimal representation.

Information

Author(s): Shreyas Arvindekar, Shruthi Viswanath

Date: April 7th, 2023

License: CC BY-SA 4.0 This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

Last known good IMP version: not tested

Testable: Yes

Parallelizeable: Yes

Publications: Arvindekar, S., et. al. Optimizing representations for integrative structural modeling using bayesian model selection, Bioinformatics, 40(3), btae106, 2024. DOI: 10.1093/bioinformatics/btae106.