Closed jschrier closed 1 year ago
Sounds good! I'll run the merge_chemical_dictionaries.nb and fix the errors.
@oliviavanden : to save you a step, the current version of merge_chemical_dictionaries.nb
now lists the active errors (no need to run it again...until you want to check if they are fixed)
Awesome thanks!
Most of the errors have been fixed with only 4 remaining. Some issues that have been established is with Ammonium Nitrate, N1,N3-di(hexa-1,3,5-triyn-1-yl)-N1,N3-dimethylmalonamide--dihydrogen \ (1/10), and N-(3,4,5-trimethylphenyl)-1,10-phenanthroline-2-carboxamide. These molecules either have bugs or need further inspection from literature.
Most of these errors were bas SMILES, InChIs, or InChIKeys. I used the Mathematica script provided, and other mathematica checks to see what information was consistent and what wasn't.
I've added the next sanity check: For each record that is a pure substance (has molecular identifiers), is the given SMILES (when converted to a Molecule) consistent with the given InChI (when converted)? And is the InChI consistent with the InChIKey?
There are 23 cases where this sanity check breaks. Check code is in the updated
merge_chemical_dictionaries.nb
Here's an example of the first error:
What's going on here? Looks like a proton tautomer difference in the representation: One is NH4+ . NO3- the other is NH3 . HNO3 . So it should be easy to make these consistent. As before, you should be able to run the whole
merge_chemical_dictionaries.nb
notebook to confirm that you've solved all errors.