jump-cellpainting / datasets

Images and other data from the JUMP Cell Painting Consortium
BSD 3-Clause "New" or "Revised" License
163 stars 17 forks source link

Converting Metadata_JCP2022 column information from the mad_int_featselect_harmony.parquet file into treatment names #116

Open EmiliyaStol opened 4 months ago

EmiliyaStol commented 4 months ago

I would like to know how to translate the Metadata_JCP2022 column information from the mad_int_featselect_harmony.parquet file (for example: “JCP2022_049123”), in the cpg0016-jump-integrated dataset, into the treatment names associated with each well.

shntnu commented 4 months ago

At present, we have released only the SMILES, available in the compound.csv.gz file in https://github.com/jump-cellpainting/datasets/tree/main/metadata. Eventually, ChEMBL IDs will be available through the JUMP annotator https://github.com/broadinstitute/monorepo/tree/main/libs/jump_compound_annotator, which should make it possible to get treatment names.

Also, please use the URLs indicated here, to get the parquet files you should use (and not cpg0016-jump-integrated; that will be deprecated)

https://github.com/jump-cellpainting/datasets/blob/main/profile_index.csv