Closed jschrier closed 1 year ago
Sounds good. I'll check this and see what the errors are.
I'll give you some examples:
e.g., Sodium nitrate (NaNO3) has an invalid SMILES . It currently reads NO3.Na
but it should be O=[N+]([O-])[O-].[Na+]
. (the InChI is also fubar for this entry, and it appears to be duplicated)
e.g., BTP has SMILES 1=CC(=C(N=C1)C2=NN=NC=C2)C3=NN=NC=C3
in the file, but that can't be a correct SMILES (notice the (=C
). You'll want to check the structure on this, as the given name bis-triazinyl-pyridine
is ambiguous
e.g., SODIUM BICARBONATE has what appears to be an InChI key as the SMILES (WTF?)
Olivia's correction runs fine (I suspect errors she experienced with the notebook arose from using an older version of Mathematica? I don't know, I had no runtime errors)
I've corrected most* of the SMILES errors, but 46 InChI errors persist which need correction.
extraction_records
contained in this repository (either by its synonym or full name). Go back to the source article and confirm structure
No errors remaining (fixed by @oliviavanden )
I've started building tests of the chemical dictionary entry validity in
/scripts/merge_chemical_dictionaries.nb
So far, 51 entries have an invalid or missing InChI or SMILES entry. These will need to be corrected before subsequent tests can be written/performed towards the goal of merging the chemical dictionaries
You can see where the errors are by opening the
/scripts/merge_chemical_dictionaries.nb
.You can check that you have solved this by the said Mathematica notebook.