jump-cellpainting / 2024_Chandrasekaran_Production

BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Report feature quality #5

Open niranjchandrasekaran opened 1 year ago

niranjchandrasekaran commented 1 year ago

cc @shntnu @MarziehHaghighi

Here is list of materials to generate and questions to explore. Please add anything else you think might be useful to check to this list by directly editing my comment:

We each of three ORF, CRISPR and Compound datasets, I check the following:

Blank diagram - Page 1 (1)

UPDATE:

MarziehHaghighi commented 1 year ago

Perturbation replicate reproducibility:

Jump-compound - source_8 ![jump_compound_corr_curves_source_8](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/c60f64be-c188-460f-b4ce-1c5a03f1bce1) - source_7 ![jump_compound_corr_curves_source_7](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/d07f351d-bcf8-475a-8c00-2899c34bd470) - source_6 ![jump_compound_corr_curves_source_6](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/834ee326-435e-43b6-b976-77ccdbf6666b) - source_5 ![jump_compound_corr_curves_source_5](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/aa4903dc-e65a-4fc1-8872-e091eec9ccf9) - source_11 ![jump_compound_corr_curves_source_11](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/05b56e6f-3430-40a2-8ce6-bbcb91a11f1c) - source_10 ![jump_compound_corr_curves_source_10](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/800c28b4-0658-447d-9223-345e2b65c709) - source_9 ![jump_compound_corr_curves_source_2](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/3699c9bf-aa3a-4b22-8a4c-ef156717d86a) - source_1 ![jump_compound_corr_curves_source_1](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/235dfc6a-ef51-49d8-8509-e1926a9208e8) - source_3 ![jump_compound_corr_curves_source_3](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/9824232a-7fdf-4211-b81e-49fe82918faa)
Jump-orf ![jump_orf_corr_curves](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/c546b8d3-ac45-4c18-bafe-8cceb68b3df8)
Jump-crispr ![jump_crispr_corr_curves](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/17a4c9a4-676a-4a50-b5dd-bb8a96746055)
taorf ![taorf_corr_curves_broad 2](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/88189fc6-eb66-4f8a-a07d-3fedda9e6f69)
lincs ![lincs_g_corr_curves_broad](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/9faf18d0-97fa-4923-84bb-da02a2a1b419)
MarziehHaghighi commented 1 year ago

Is feature quality consistent across sources or batches of each dataset?

Jump-compound - source 1 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/c532d7e9-4e16-4925-ac46-7184a47d1ee5) - source 11 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/705c11b4-6a5e-4d39-85fe-3363c96f469f) - source 7 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/4ff6bd4a-ca56-4ad1-b508-d2751155b79b) - source 8 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/c734eaa5-5620-476b-b9d7-01e0136e31bc) - source 3 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/b951a842-c3ed-4290-82e7-fd89265a4117) - source 2 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/9da3de70-b7e0-4f9c-aa54-9a7accb28f26) - source 10 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/53bd2b88-f90e-46dc-baf5-e4e4c9f45ce4) - source 6 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/c69ff0f2-328d-44a5-b95a-80fea27947ca) - source 5 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/666134fe-4783-45a7-ba93-c1e1ee928e67)
Jump-orf ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/ade6dadc-a765-4c67-992c-6f5f43ba5d31)
Jump-crispr ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/9d1620a6-3dc4-4bcc-8f38-05843c5a4995)
taorf
lincs
MarziehHaghighi commented 1 year ago

Is feature quality consistent across various datasets and sources?

image

Are there groups with overall high or low quality according to median scores across datasets?

image

Same analysis but with including lincs:

MarziehHaghighi commented 1 year ago

Feature replicability across datasets

TA-ORF, Jump-ORF and Jump-CRISPR

image

Jump-Compound and LINCS

AnneCarpenter commented 1 year ago

We discussed in checkin that most JUMP sources do not have replicates of a given compound, except for the Target2 plates that most partners did in many replicates (except source_1 did not do that plate, explaining why its result is quite different though we are not sure what compounds are shown in the plot because there would be very few with replicates at all!). So here we are probably looking at the results for around 300 compounds.

Another exception is that the three wave 2 partners may have had 2 replicates per compound because they had a different swapping scheme.

Overall, Marzieh if you're able to describe some conclusions here from each result that would be great because it's hard to grasp just looking at the plots what analysis is happening. Thx!

MarziehHaghighi commented 1 year ago

@AnneCarpenter sure these results are not complete yet. I just wanted to show you the mito_radialdistribution category quality being low according to the median over datasets in the checkin (which has the caveat of high variance across datasets). I will go trough a complete interpretation once the tasks in this issue are complete. For now I can say that feature quality seems to be consistent among various batches within an experiment but that doesn't hold across datasets. That means that we cant say for example this specific group of features are always low quality relative to the rest of features in all cell painting experiments/datasets but we can make such a statement for different batches within a dataset/source/experiment. But let's pause here and come back to it once I have all I need to have a conclusion.