Report feature quality - Githubissues

niranjchandrasekaran commented 1 year ago

cc @shntnu @MarziehHaghighi

Here is list of materials to generate and questions to explore. Please add anything else you think might be useful to check to this list by directly editing my comment:

We each of three ORF, CRISPR and Compound datasets, I check the following:

Blank diagram - Page 1 (1)

UPDATE:

Basic tables as reference:
- Ranking of perturbations by their replicability of their profiles
- Ranking of features by their quality
- Group-wise feature quality map
  - Is there any consistent pattern across datasets for some categories to be high/low quality?
Can we trust the ranking?
- Is feature quality (as of current definitions) variable from batch to batch?
- Is feature quality variable across datasets? (for same perturbations, comparison of each perturbation type with its corresponding available dataset as shown in the figure)
The current quality metric ranks features based on their consistency in an experiment, does features replicate across experiments?
- Rank features based on their replicability across different batches.
- Rank features based on their replicability across different datasets.

MarziehHaghighi commented 1 year ago

Perturbation replicate reproducibility:

Jump-compound

- source_8 ![jump_compound_corr_curves_source_8](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/c60f64be-c188-460f-b4ce-1c5a03f1bce1) - source_7 ![jump_compound_corr_curves_source_7](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/d07f351d-bcf8-475a-8c00-2899c34bd470) - source_6 ![jump_compound_corr_curves_source_6](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/834ee326-435e-43b6-b976-77ccdbf6666b) - source_5 ![jump_compound_corr_curves_source_5](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/aa4903dc-e65a-4fc1-8872-e091eec9ccf9) - source_11 ![jump_compound_corr_curves_source_11](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/05b56e6f-3430-40a2-8ce6-bbcb91a11f1c) - source_10 ![jump_compound_corr_curves_source_10](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/800c28b4-0658-447d-9223-345e2b65c709) - source_9 ![jump_compound_corr_curves_source_2](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/3699c9bf-aa3a-4b22-8a4c-ef156717d86a) - source_1 ![jump_compound_corr_curves_source_1](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/235dfc6a-ef51-49d8-8509-e1926a9208e8) - source_3 ![jump_compound_corr_curves_source_3](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/9824232a-7fdf-4211-b81e-49fe82918faa)

Jump-orf

![jump_orf_corr_curves](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/c546b8d3-ac45-4c18-bafe-8cceb68b3df8)

Jump-crispr

![jump_crispr_corr_curves](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/17a4c9a4-676a-4a50-b5dd-bb8a96746055)

taorf

![taorf_corr_curves_broad 2](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/88189fc6-eb66-4f8a-a07d-3fedda9e6f69)

lincs

![lincs_g_corr_curves_broad](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/9faf18d0-97fa-4923-84bb-da02a2a1b419)

MarziehHaghighi commented 1 year ago

Is feature quality consistent across sources or batches of each dataset?

Jump-compound

- source 1 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/c532d7e9-4e16-4925-ac46-7184a47d1ee5) - source 11 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/705c11b4-6a5e-4d39-85fe-3363c96f469f) - source 7 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/4ff6bd4a-ca56-4ad1-b508-d2751155b79b) - source 8 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/c734eaa5-5620-476b-b9d7-01e0136e31bc) - source 3 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/b951a842-c3ed-4290-82e7-fd89265a4117) - source 2 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/9da3de70-b7e0-4f9c-aa54-9a7accb28f26) - source 10 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/53bd2b88-f90e-46dc-baf5-e4e4c9f45ce4) - source 6 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/c69ff0f2-328d-44a5-b95a-80fea27947ca) - source 5 ![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/666134fe-4783-45a7-ba93-c1e1ee928e67)

Jump-orf

![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/ade6dadc-a765-4c67-992c-6f5f43ba5d31)

Jump-crispr

![image](https://github.com/jump-cellpainting/jump-data-production-paper/assets/30931779/9d1620a6-3dc4-4bcc-8f38-05843c5a4995)

taorf

lincs

MarziehHaghighi commented 1 year ago

Is feature quality consistent across various datasets and sources?

Are there groups with overall high or low quality according to median scores across datasets?

Same analysis but with including lincs:

all the datasets except lincs have the same set of features, so adding lincs will reduce the overlap

MarziehHaghighi commented 1 year ago

Feature replicability across datasets

for each dataset of genetic or chemical perturbations, we can take the overlap of perturbations and calculate the pairwise correlation coefficient of each feature profile across datasets.

TA-ORF, Jump-ORF and Jump-CRISPR

Jump-Compound and LINCS

I need the map between perturbation IDs in jump-cmpound (Metadata_JCP2022) and lincs_g

AnneCarpenter commented 1 year ago

We discussed in checkin that most JUMP sources do not have replicates of a given compound, except for the Target2 plates that most partners did in many replicates (except source_1 did not do that plate, explaining why its result is quite different though we are not sure what compounds are shown in the plot because there would be very few with replicates at all!). So here we are probably looking at the results for around 300 compounds.

Another exception is that the three wave 2 partners may have had 2 replicates per compound because they had a different swapping scheme.

Overall, Marzieh if you're able to describe some conclusions here from each result that would be great because it's hard to grasp just looking at the plots what analysis is happening. Thx!

MarziehHaghighi commented 1 year ago

@AnneCarpenter sure these results are not complete yet. I just wanted to show you the mito_radialdistribution category quality being low according to the median over datasets in the checkin (which has the caveat of high variance across datasets). I will go trough a complete interpretation once the tasks in this issue are complete. For now I can say that feature quality seems to be consistent among various batches within an experiment but that doesn't hold across datasets. That means that we cant say for example this specific group of features are always low quality relative to the rest of features in all cell painting experiments/datasets but we can make such a statement for different batches within a dataset/source/experiment. But let's pause here and come back to it once I have all I need to have a conclusion.

jump-cellpainting / 2024_Chandrasekaran_Production

Report feature quality #5

Perturbation replicate reproducibility:

Is feature quality consistent across sources or batches of each dataset?

Is feature quality consistent across various datasets and sources?

Are there groups with overall high or low quality according to median scores across datasets?

Same analysis but with including lincs:

Feature replicability across datasets

TA-ORF, Jump-ORF and Jump-CRISPR

Jump-Compound and LINCS