Closed srijitseal closed 6 months ago
Over to @afermg to review
@srijitseal I noticed a few things that would need fixing, but let's wait for @afermg 's full review
Metadata_
@srijitseal Could you please point me to the code? I think it can go to monorepo, but we must first package it as a small library for reproducibility. The most important things at the moment is pinning the dependencies. Let me know if I can be of help for that.
https://github.com/jump-cellpainting/jump-cellpainting/pull/156 You can find the file here! It's almost the same but I removed the loop to save time after consulting with Andreas, I think the efficiency to standardize now is 6 times faster for less loss of information and tautomers will always remain a problem no matter which package we use or how many loops we run for finding the best tautomer.
On Mon, Mar 18, 2024 at 11:56 AM Alán F. Muñoz @.***> wrote:
@srijitseal https://github.com/srijitseal Could you please point me to the code? I think it can go to monorepo, but we must first package it as a small library for reproducibility. The most important things at the moment is pinning the dependencies. Let me know if I can be of help for that.
— Reply to this email directly, view it on GitHub https://github.com/jump-cellpainting/datasets/pull/103#issuecomment-2004307150, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN34ZTZHB5STEHDQAOEG5DLYY4FBBAVCNFSM6AAAAABE2ZNC5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBUGMYDOMJVGA . You are receiving this because you were mentioned.Message ID: @.***>
The comment on pinning dependencies is only for reproducibility, I am not knowledgeable enough about chemoinformatics to comment about the usage of those library. I do need the dependency versions and one test to ensure that packaging still works.
Sorry if it seems like I'm asking for a lot, I just want to ensure that the code that we put in the monorepo runs correctly so it can be reliably referred to in the future. Also, because it is to be a tiny tool, we need to have it as a script/module, not a notebook. I can do the transformation though, as long as I can reproduce the environment in which you produced the data.
I overrode this PR for now by using the SMILES generated when the JCP IDs were created Details: https://github.com/jump-cellpainting/datasets-private/pull/88
This file adds a standardized SMILES column and the first 14 characters of the InChI key (representing the connectivity) to the compounds.csv.gz We show that six compounds have dual entires, often different ionization states.