openforcefield / openff-benchmark

Comparison benchmarks between public force fields and Open Force Field Initiative force fields
MIT License
10 stars 2 forks source link

Added try-except for rdMolAlign call, NaN injection on failure with message. #94

Closed dotsdl closed 3 years ago

dotsdl commented 3 years ago

Description

Added try-except for rdMolAlign call, NaN injection on failure with message.

This is in response to an issue Xavier Lucas ran into in which a failure to calculate the RMSD by RDKit causes the whole analysis to fail.

This change makes the analysis tolerant of failures at this point, while giving an informative message indicating on which conformer(s) the failure occurred.

Status

codecov-commenter commented 3 years ago

Codecov Report

Merging #94 (4815e55) into season-1 (f92ca3f) will decrease coverage by 18.91%. The diff coverage is 0.00%.

dotsdl commented 3 years ago

@ldamore could I get your review on this one? This is meant to address the issue raised by Xavier:

$ openff-benchmark report compare-forcefields --input-path 4-compute-qm --input-path 4-compute-mm --ref-method b3lyp-d3bj --output-directory 5-compare_forcefields
Reading files: 100%|| 2/2 [37:33<00:00, 1126.78s/it]
Checking input: 100%|| 8/8 [00:00<00:00, 528.73it/s]
Checking input:   0%|                                                                                                                                                                        | 0/8 [00:00<?, ?it/s]/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/openff/benchmark/analysis/analysis.py:194: UserWarning: Not all conformers of method b3lyp-d3bj considered, because these are not available in other methods.
  warnings.warn(f"Not all conformers of method {m} considered, because these are not available in other methods.")
Checking input:  12%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588                                                                                                                                            | 1/8 [00:00<00:00,  7.21it/s]/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/openff/benchmark/analysis/analysis.py:194: UserWarning: Not all conformers of method openff-1.1.1 considered, because these are not available in other methods.
  warnings.warn(f"Not all conformers of method {m} considered, because these are not available in other methods.")
/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/openff/benchmark/analysis/analysis.py:194: UserWarning: Not all conformers of method opls3e_default considered, because these are not available in other methods.
  warnings.warn(f"Not all conformers of method {m} considered, because these are not available in other methods.")
/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/openff/benchmark/analysis/analysis.py:194: UserWarning: Not all conformers of method openff-1.0.0 considered, because these are not available in other methods.
  warnings.warn(f"Not all conformers of method {m} considered, because these are not available in other methods.")
/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/openff/benchmark/analysis/analysis.py:194: UserWarning: Not all conformers of method smirnoff99Frosst-1.1.0 considered, because these are not available in other methods.
  warnings.warn(f"Not all conformers of method {m} considered, because these are not available in other methods.")
Checking input: 100%|| 8/8 [00:00<00:00, 45.88it/s]
Finding reference molecules: 100%|| 817/817 [00:01<00:00, 794.54it/s]
Referencing energies: 100%|| 817/817 [00:01<00:00, 589.27it/s]
Referencing energies: 100%|| 817/817 [00:01<00:00, 637.55it/s]
Calculating RMSD: 4315it [03:08, 22.86it/s]| 745/817 [00:01<00:00, 737.70it/s]
Calculating TFD: 4315it [00:26, 163.48it/s]
Referencing energies: 100%|| 817/817 [00:01<00:00, 643.71it/s]
Calculating RMSD: 4315it [03:09, 22.83it/s]| 746/817 [00:01<00:00, 749.46it/s]
Calculating TFD: 4315it [00:26, 163.21it/s]
Referencing energies: 100%|| 817/817 [00:01<00:00, 640.70it/s]
Calculating RMSD: 4315it [03:12, 22.46it/s]| 746/817 [00:01<00:00, 747.16it/s]
Calculating TFD: 4315it [00:27, 158.00it/s]
Referencing energies: 100%|| 817/817 [00:01<00:00, 626.74it/s]
Calculating RMSD: 4315it [03:10, 22.60it/s]| 723/817 [00:01<00:00, 714.82it/s]
Calculating TFD: 4315it [00:26, 163.63it/s]
Referencing energies: 100%|| 817/817 [00:01<00:00, 636.85it/s]
Calculating RMSD: 751it [00:14, 50.60it/s]| 734/817 [00:01<00:00, 735.95it/s]
Processing data:  50%|| 4/8 [14:49<14:49, 222.34s/it]
Traceback (most recent call last):
  File "/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/bin/openff-benchmark", line 8, in <module>
    sys.exit(cli())
  File "/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/openff/benchmark/cli.py", line 759, in compare_forcefields
    analysis.main(input_path, ref_method, output_directory)
  File "/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/openff/benchmark/analysis/analysis.py", line 203, in main
    calc_rmsd(dataframes[ref_method], dataframes[m])
  File "/pstore/apps/.testing/OpenForceField/0.8.4rc1-benchmark/lib/python3.7/site-packages/openff/benchmark/analysis/analysis.py", line 54, in calc_rmsd
    result.loc[i, 'rmsd'] = rdMolAlign.GetBestRMS(row['mol'].to_rdkit(), result.loc[i, 'mol'].to_rdkit())
RuntimeError: No sub-structure match found between the reference and probe mol
dotsdl commented 3 years ago

We don't have a test module in place for the analysis modules yet. I performed an ad-hoc test on this feature by swapping conformers from two different molecules in one of my MM results (openff-1.1.1) from the burn-in set.

I get, as expected:

$ openff-benchmark report compare-forcefields --input-path 4-compute-qm-filtered --input-path 4-compute-mm-filtered --ref-method b3lyp-d3bj --output-directory 5-compare_forcefields
...
Unable to calculate best RMSD between b3lyp-d3bj and openff-1.1.1; conformer `TST-00110-00`
Unable to calculate best RMSD between b3lyp-d3bj and openff-1.1.1; conformer `TST-00095-07`
WARNING: The reference mol TST-00110-00 and query mol TST-00110-00 do NOT have the same SMILES strings as determined by RDKit MolToSmiles. 
 [H]c1c(OC([H])([H])[H])nc(N([H])C(=O)N([H])S(=O)(=O)C([H])([H])c2c([H])c([H])c([H])c([H])c2C(=O)OC([H])([H])[H])nc1OC([H])([H])[H]
 [H]N(c1nc(SC([H])([H])[H])nc(N([H])C([H])(C([H])([H])[H])C([H])([H])[H])n1)C([H])([H])C([H])([H])C([H])([H])[H]
- WARNING: The reference mol TST-00095-07 and query mol TST-00095-07 do NOT have the same SMILES strings as determined by RDKit MolToSmiles. 
 [H]N(c1nc(SC([H])([H])[H])nc(N([H])C([H])(C([H])([H])[H])C([H])([H])[H])n1)C([H])([H])C([H])([H])C([H])([H])[H]
 [H]c1c(OC([H])([H])[H])nc(N([H])C(=O)N([H])S(=O)(=O)C([H])([H])c2c([H])c([H])c([H])c([H])c2C(=O)OC([H])([H])[H])nc1OC([H])([H])[H]

And the contents of the output file have NaNs in the expected places:

$ cat 5-compare_forcefields/openff-1.1.1.csv 
name,group_name,molecule_index,conformer_index,rmsd,tfd,dde[kcal/mol]
TST-00010-00,TST,00010,00, 3.21458367e-02, 1.03306624e-03, 0.00000000e+00
TST-00110-00,TST,00110,00,,, 0.00000000e+00
TST-00116-00,TST,00116,00, 7.56105972e-02, 2.12679387e-03, 0.00000000e+00
TST-00095-07,TST,00095,07,,,-2.08604037e+01
TST-00222-00,TST,00222,00, 2.45894555e-02, 0.00000000e+00, 0.00000000e+00
TST-00082-00,TST,00082,00, 1.14619196e-01, 8.50385549e-03, 0.00000000e+00
TST-00113-00,TST,00113,00, 2.94241533e-01, 1.09174439e-01, 0.00000000e+00
TST-00035-00,TST,00035,00, 3.25742181e-02, 3.97723503e-05, 0.00000000e+00
TST-00038-00,TST,00038,00, 1.48897098e-02, 0.00000000e+00, 0.00000000e+00
TST-00005-00,TST,00005,00, 8.85880694e-02, 3.45953855e-02, 0.00000000e+00
TST-00095-00,TST,00095,00, 2.79291921e-01, 9.11232357e-02, 0.00000000e+00
TST-00176-00,TST,00176,00, 1.37752953e-02,, 0.00000000e+00
TST-00152-00,TST,00152,00, 3.79610075e-02, 0.00000000e+00, 0.00000000e+00
TST-00093-00,TST,00093,00, 1.85977374e-01, 2.62453482e-02, 0.00000000e+00
TST-00243-00,TST,00243,00, 3.82898359e-02, 1.08678752e-02, 0.00000000e+00
TST-00168-00,TST,00168,00, 9.85315256e-02, 5.60296282e-02, 0.00000000e+00
TST-00267-00,TST,00267,00, 2.46011056e-02, 1.28926188e-05, 0.00000000e+00
TST-00003-00,TST,00003,00, 4.39707837e-01, 3.40765089e-01, 0.00000000e+00
TST-00124-00,TST,00124,00, 3.19099507e-01, 6.22147500e-02, 0.00000000e+00
TST-00031-00,TST,00031,00, 4.74017788e-02, 2.20726841e-03, 0.00000000e+00
TST-00004-00,TST,00004,00, 5.83838077e-01, 4.96838304e-03, 0.00000000e+00
TST-00198-00,TST,00198,00, 1.07191934e-01, 2.27226710e-02, 0.00000000e+00
TST-00260-00,TST,00260,00, 3.75503722e-02, 0.00000000e+00, 0.00000000e+00
TST-00036-00,TST,00036,00, 2.59408557e-02, 1.22605281e-05, 0.00000000e+00
TST-00021-00,TST,00021,00, 2.70723674e-02, 2.68451133e-03, 0.00000000e+00
TST-00242-00,TST,00242,00, 2.42163551e-02, 0.00000000e+00, 0.00000000e+00