Closed jschrier closed 1 year ago
All issues have been resolved with the duplicate InChIKey within a dictionary.
There are still 11 duplicates "between" dictionaries (i.e., entries that have the same InChIKey in one dictionary and in another dictionary)
That's my bad for closing it accidentally. I apologize!
No need for apologies!
On Fri, Oct 13, 2023, 07:30 oliviavanden @.***> wrote:
That's my bad for closing it accidentally. I apologize!
— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_raver8_ML-5Fchemical_issues_11-23issuecomment-2D1761523788&d=DwMFaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=TkdkMZKgCpYcE_rS3xubC7pX-Fv1fDBJWWAItU-ijMU&m=KsLLXLcgailLptlnxC1v73QxUtLRK3_FJsC8x8_5P_a9UaCGetGueE8igUzj8oOT&s=K7XjBS13gIhv4SFPtGL7lFTFPTRB0k2aD2qMdR7nuvo&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AB3WW56LRV7YDTWDIIFCVTLX7E66PAVCNFSM6AAAAAA5RE73LOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRRGUZDGNZYHA&d=DwMFaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=TkdkMZKgCpYcE_rS3xubC7pX-Fv1fDBJWWAItU-ijMU&m=KsLLXLcgailLptlnxC1v73QxUtLRK3_FJsC8x8_5P_a9UaCGetGueE8igUzj8oOT&s=nFw7qqxhkTqsZ4lM_41y9Ul0AV16dz8MIhEw1zK5UtA&e= . You are receiving this because you modified the open/close state.Message ID: @.***>
InChIKey duplicates are being shaven down. One issue is because the RM and OV chemical dictionary records are still in the review, they will still register as duplicates in the Mathematica script. Should I delete the OV and RM chemical dictionary records from the review as well?
Should the "review" branch be merged into "main"?
Then, when you switch to main, you should only have one file present...
I merged the review into the main, and when I switch to main only one file is present. However, when I switch to review, three files are present. I'm not sure if that is what is causing an issue for me because I'll open up the Mathematica script, and it will say 2 repeats of InChIKeys between scripts. I go to check which molecules are repeating, and then only one of that molecule appears in the combined chemical dictionary record. I'll check the OV or RM code, and it will be present there. Essentially, the Mathematica script is telling me there are multiple repeats when in the combined record there are none. It is only what is between the records that is appearing.
Many of these duplicate InChIKeys were simply duplicate molecules. The only issue where I got stuck was when there were different SMILES for the different repeats. However, they were all correlating with the same molecule.
I've added a check for duplicated InChIKeys in the chemical dictionary entries.
The new
merge_chemical_dictionaries.nb
includes checks for duplicate InChIKey within a dictionary and between a dictionary.I've found 31 instances of duplications. @raver8 is the worst offender
ACTION: Merge these intelligently (make sure that synonyms are correct and we have the most expansive set of synonyms) after solving the previous issue