raver8 / ML_chemical

0 stars 0 forks source link

51 invalid InChI and SMILES identifiers in chemical dictionaries #7

Closed jschrier closed 1 year ago

jschrier commented 1 year ago

I've started building tests of the chemical dictionary entry validity in /scripts/merge_chemical_dictionaries.nb

So far, 51 entries have an invalid or missing InChI or SMILES entry. These will need to be corrected before subsequent tests can be written/performed towards the goal of merging the chemical dictionaries

You can see where the errors are by opening the /scripts/merge_chemical_dictionaries.nb.

You can check that you have solved this by the said Mathematica notebook.

oliviavanden commented 1 year ago

Sounds good. I'll check this and see what the errors are.

jschrier commented 1 year ago

I'll give you some examples:

e.g., Sodium nitrate (NaNO3) has an invalid SMILES . It currently reads NO3.Na but it should be O=[N+]([O-])[O-].[Na+]. (the InChI is also fubar for this entry, and it appears to be duplicated)

e.g., BTP has SMILES 1=CC(=C(N=C1)C2=NN=NC=C2)C3=NN=NC=C3 in the file, but that can't be a correct SMILES (notice the (=C). You'll want to check the structure on this, as the given name bis-triazinyl-pyridine is ambiguous

Screenshot 2023-09-16 at 7 35 06 AM

e.g., SODIUM BICARBONATE has what appears to be an InChI key as the SMILES (WTF?)

jschrier commented 1 year ago

Olivia's correction runs fine (I suspect errors she experienced with the notebook arose from using an older version of Mathematica? I don't know, I had no runtime errors)

I've corrected most* of the SMILES errors, but 46 InChI errors persist which need correction.

jschrier commented 1 year ago

No errors remaining (fixed by @oliviavanden )