Code refact and merge of old and new analyses

Description

This PR introduces two additional benchmark analysis proposed by B. Swope and X. Lucas.

Installation of `analysis` command group in a new conda environment

First, follow the installation procedures for the openff-benchmark-optimization environment described in Deployment Procedure document

Once this is done, you can clone the environment into a new conda environment:

conda create -n openff-analysis --clone openff-benchmark-optimization
conda activate openff-analysis

Install the analysis branch from github:

git clone https://github.com/openforcefield/openff-benchmark
cd openff-benchmark
git checkout --track origin/analysis
pip install -e .

General comments

The two new analysis are typically executed at point (5) of the Optimization Benchmark Protocol

With openff-benchmark report swope the analysis proposed by B. Swope is executed. The command accepts the paths of the optimized molecules obtained from the optimization step. Additionally, the reference method (b3lyp-d3bj by default) and an output directory can be specified.

openff-benchmark report swope --input-path 4-compute-qm-filtered --input-path 4-compute-mm-filtered --ref-method b3lyp-d3bj --output-directory 5-results-swope

The command creates one output csv file per method, which are named swope_<method>.csv, i.e. swope_openff-1.0.0.csv

The analysis proposed by B. Swope operates as follows:

For each molecule MOL-00001-XX.sdf
- Report the relative energy (dE) of each MM optimized conformer (MM_conf) with respect to the MM optimized conformer which is the global minimum (MM_min)
- dE = E_{MM_conf} - E_{MM_min}
- Report the RMSD between each MM_conf and the QM optimized conformer which is the global minimum (QM_min)
- RMSD (MM_conf | QM_min)

With openff-benchmark report lucas the analysis proposed by X. Lucas is executed. The command accepts the paths of the optimized molecules obtained from the optimization step. Additionally, the reference method (b3lyp-d3bj by default) and an output directory can be specified.

openff-benchmark report lucas --input-path 4-compute-qm-filtered --input-path 4-compute-mm-filtered --ref-method b3lyp-d3bj --output-directory 5-results-lucas

The command creates one output csv file per method, which are named lucas_<method>.csv, i.e. lucas_openff-1.0.0.csv

The analysis proposed by X. Lucas operates as follows:

For each molecule MOL-00001-XX.sdf
- Find the MM reference conformer (MM_ref) with the lowest RMSD with respect to QM_min
- Report the relative energy (dE) and RMSD between MM_ref and MM_min
- dE = E_{MM_ref} - E_{MM_min}
- RMSD (MM_ref | MM_min)

The final openff-benchmark report plots-swope and openff-benchmark report plots-lucas commands take the directories containing the csv files as an input (output of 5-results-swope or 5-results-lucas).

For 5-results-swope an rmsd-cutoff and de-cutoff should be set.

openff-benchmark report plots-swope --input-path 5-results-swope/ --de-cutoff 1.5 --rmsd-cutoff 1.0

The algorithm will generate a ridge plot of all the conformers within the rmsd-cutoff for a range of dE values, and another ridge plot of all the conformers within the de-cutoff for a range of rsmd values.

For 5-results-lucas

openff-benchmark report plots-lucas --input-path 5-results-lucas/

The algorithm will generate similar plots as for compare-forcefields and match_minima

Please note

Likewise openff-benchmark report match-minima, also openff-benchmark report lucas matches the conformers by rmsd and this step is quite time consuming. However, the intersection method added in this PR now allows the user to run the analysis on each different FF method as separate task, speeding up all the process. e.g.

for mm_path in `ls -d 4-compute-mm-filtered/*`; do 
   openff-benchmark report match-minima --input-path 4-compute-qm-filtered \
                                        --input-path $mm_path \
                                       --ref-method b3lyp-d3bj \
                                        --output-directory 5-match-minima &  done

In addition, the QM-to-QM comparison will be skipped. Please note that now the plot commands runs without specifying the reference method, e.g.

openff-benchmark report plots-lucas --input-path 5-results-lucas/

Todos

[x] Fix an error that prevent the drawing of density plots when using the intersection of methods at the plotting stage.
[x] Introduced the intersection of methods at the plotting stage.
[ ] Increase the size of y_ticks_labels to the same as x_ticks_labels.

Questions

[ ] Question1

Status

[ x] Ready to go

openforcefield / openff-benchmark