rdMolAlign.GetBestRMS fails in some cases when computing the RMSD between a QM optimized and an OPLS optimized molecule. This happens for all the analysis computing the rmsd, see e.g. for swope analysis:
Traceback (most recent call last):, 5.83s/it]
File "/users/home/ldamored/miniconda/envs/off-analysfix/bin/openff-benchmark", line 8, in <module>
sys.exit(cli())449it [2:30:31, 4.16s/it]
File "/users/home/ldamored/miniconda/envs/off-analysfix/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/users/home/ldamored/miniconda/envs/off-analysfix/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/users/home/ldamored/miniconda/envs/off-analysfix/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/users/home/ldamored/miniconda/envs/off-analysfix/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/users/home/ldamored/miniconda/envs/off-analysfix/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/users/home/ldamored/miniconda/envs/off-analysfix/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/users/home/ldamored/miniconda/envs/off-analysfix/lib/python3.7/site-packages/openff/benchmark/cli.py", line 887, in swope
analysis.swope(input_path, ref_method, output_directory)
File "/users/home/ldamored/miniconda/envs/off-analysfix/lib/python3.7/site-packages/openff/benchmark/analysis/analysis.py", line 284, in swope
row['mol'].to_rdkit(), qm_df.loc[qm_min, 'mol'].to_rdkit())
RuntimeError: No sub-structure match found between the reference and probe mol
A try...except was added to the code in order to keep track of the offending molecules: e.g.
for i, row in tqdm(reference.iterrows(), desc='Calculating RMSD'):
try:
result.loc[i, 'rmsd'] = rdMolAlign.GetBestRMS(row['mol'].to_rdkit(), result.loc[i, 'mol'].to_rdkit())
except RuntimeError:
result.loc[i, 'rmsd'] = np.NaN
print(f"Unable to calculate best RMSD between {ref_name} and {result_name}; conformer `{i}`")
In the OpenFF Public dataset this led to the identification of the following offending molecules:
By visual inspection of these molecules it appears that there is a different connectivity arrangement between the QM and the OPLS molecule (and more in general between all the molecules optimized with openff-benchmark and the corresponding OPLS optimized molecules).
See e.g. GNT-00927-00
left side: QM mol; right side: opls-optimized mol; it appears that the sulfonate group in the QM mol is [O-S-O], whereas for OPLS is [O=S=O].
Using rdMolAlign.GetBestRMS(MolStandardize.rdMolStandardize.Normalize(rdkitmol) solves the issue for the sulfonate group but the issue still persist in other offending molecules, so it doesn't seem to be a permanent solution.
try:
result.loc[i, 'rmsd'] = rdMolAlign.GetBestRMS(MolStandardize.rdMolStandardize.Normalize(row['mol'].to_rdkit()),
MolStandardize.rdMolStandardize.Normalize(result.loc[i, 'mol'].to_rdkit())
)
except RuntimeError:
result.loc[i, 'rmsd'] = np.NaN
print(f"Unable to calculate best RMSD between {ref_name} and {result_name}; conformer `{i}`")
See e.g. GNT-00245-00, where the connectivity issue persists:
left side (QM mol) has an [O-C=N], whereas in the right side (opls) there is [O=C-N]
Excellent, thank you for writing this up @ldamore! We don't have a solution for this as of yet, but this issue will give us a space to address it in time.
rdMolAlign.GetBestRMS
fails in some cases when computing the RMSD between a QM optimized and an OPLS optimized molecule. This happens for all the analysis computing the rmsd, see e.g. for swope analysis:A
try
...except
was added to the code in order to keep track of the offending molecules: e.g.In the OpenFF Public dataset this led to the identification of the following offending molecules:
Unable to calculate best RMSD between b3lyp-d3bj and opls4_default; conformer
GNT-00927-00
,GNT-00927-03
,GNT-00927-02
,GNT-00927-01
,GNT-00927-04
,GNT-00927-05
,GNT-00245-01
,GNT-00245-03
,GNT-00927-06
,GNT-00245-00
,GNT-00245-,2
,GNT-00927-07
,GNT-00245-04
,GNT-00245-05
,GNT-00855-02
,GNT-00855-03
,GNT-00855-04
,GNT-00855-05
,GNT-00855-01
,GNT-00855-06
,GNT-00855-07
,GNT-00855-00
,GNT-00691-01
,GNT-00691-02
,GNT-00691-00
,GNT-00691-,6
,GNT-00691-05
,GNT-00691-03
,GNT-00691-04
,GNT-00691-07
,GNT-00691-08
,RCH-00138-00
,RCH-00138-01
,RCH-00138-04
,RCH-00138-07
,RCH-00138-02
,RCH-00138-03
,RCH-00138-08
,RCH-00138-05
,RCH-00138-09
,RCH-00138-,6
,RCH-00935-00
,RCH-00935-01
,RCH-00935-04
,RCH-00935-05
,RCH-00935-02
,RCH-00935-03
,RCH-00935-06
,RCH-00935-07
,RCH-00294-00
,RCH-00294-01
,RCH-00294-03
,RCH-00294-02
,RCH-00294-04
,RCH-00294-06
,RCH-00294-,5
,RCH-00643-00
,RCH-00643-02
,RCH-00643-04
,RCH-00643-03
,RCH-00643-01
,RCH-00643-05
,RCH-00643-06
,RCH-00643-08
,RCH-00643-07
,RCH-00643-09
,RCH-00324-02
,RCH-00324-03
,RCH-00324-04
,RCH-00324-06
,RCH-00324-,5
,RCH-00324-07
,RCH-00259-04
,RCH-00324-05
,RCH-00324-07
,RCH-00259-04
,RCH-01263-02
,RCH-01263-01
,RCH-01263-00
,RCH-01263-04
,RCH-01263-03
,RCH-01263-05
,RCH-01263-07
,RCH-01263-06
,RCH-00298-01
,RCH-00298-,2
,RCH-00298-05
,RCH-00298-00
,RCH-00298-04
,RCH-00298-03
,RCH-00298-06
,RCH-00298-09
,RCH-00298-07
,RCH-00298-08
By visual inspection of these molecules it appears that there is a different connectivity arrangement between the QM and the OPLS molecule (and more in general between all the molecules optimized with
openff-benchmark
and the corresponding OPLS optimized molecules).See e.g.
left side: QM mol; right side: opls-optimized mol; it appears that the sulfonate group in the QM mol is [O-S-O], whereas for OPLS is [O=S=O].
GNT-00927-00
Using
rdMolAlign.GetBestRMS(MolStandardize.rdMolStandardize.Normalize(rdkitmol)
solves the issue for the sulfonate group but the issue still persist in other offending molecules, so it doesn't seem to be a permanent solution.See e.g.
GNT-00245-00
, where the connectivity issue persists:left side (QM mol) has an [O-C=N], whereas in the right side (opls) there is [O=C-N]