jump-cellpainting / datasets

Images and other data from the JUMP Cell Painting Consortium
BSD 3-Clause "New" or "Revised" License
162 stars 17 forks source link

Why is the workspace/metadata folder missing in cpg0016? #56

Open tfindley15 opened 1 year ago

tfindley15 commented 1 year ago

Hi!

Our team at ViQi Inc. have been doing ML analysis on the cpg0012 dataset with a lot of success. We would now like to move on to the jump dataset cpg0016 and possibly cpg0004. However, I cannot find the metadata folders inside of 'workspace' for either of these datasets for any sources. Is the compound and dose information for these datasets elsewhere?

Thank you for your time and efforts!

Cheers, Reese

niranjchandrasekaran commented 1 year ago

Hi Reese,

We have a different format for metadata for cpg0004 and cpg0016.

For cpg0004, the metadata files are in the lincs-cell-painting repo. If you have questions about the metadata, please feel free to create an issue in that repo.

For cpg0016, the metadata files are in this repo. Instead of the typical plate map and metadata files, we now have three types of files that contain all the metadata information:

plate.csv.gz: contains the plate name to type of perturbation mapping. well.csv.gz: contains the plate name, well name to the perturbation ID mapping compound.csv.gz, orf.csv.gz and crispr.csv.gz: contain the perturbation ID to other external metadata mapping.

Please let us know if you have other questions!

shntnu commented 1 year ago

@tfindley15

I'm moving your Q from https://github.com/broadinstitute/cellpainting-gallery/issues/42 to here

My name is Reese and I am a data scientist at ViQi Inc. working to do AI analysis on the jump datasets. I am currently looking at the jump pilot data and I had some questions regarding the JUMP-Target-1_compound_metadata_targets.tsv. What specifically do the genes in the target and target_list columns in this file represent? What source did it come from (pubchem, etc.)? Thank you so much for your time and continued efforts!

All these annotations come from https://s3.amazonaws.com/data.clue.io/repurposing/downloads/repurposing_drugs_20200324.txt

Read more about that resource here: https://clue.io/repurposing#about

You might find this preprint useful to read: https://www.biorxiv.org/content/10.1101/2022.01.05.475090v1