jump-cellpainting / jump-scope-analysis

0 stars 0 forks source link

Preliminary analysis #1

Open NasimJ opened 2 years ago

NasimJ commented 2 years ago

Golas:

Conclusions:


Scope Vendors:

Additional information on scopes and imaging details can be found here.


Experimental details:

Plate map and compounds: JUMP-MOA plate map and compounds were used (90 compounds from 47 distinct MOA classes, with 4 replicates per compound)

NasimJ commented 2 years ago

Results from preliminary analyses

Notebook for creating the experiment metadata, and additional columns in the metadata were added later manually.

Percent replicating

Scope_vendor Batch Plate_Name Percent_Replicating
MolDev Scope1_MolDev_10X Plate2_PCO_6ch_4site_10XPA 33.3
MolDev Scope1_MolDev_10X Plate3_PCO_6ch_4site_10XPA_Crest 50
MolDev Scope1_MolDev_10X_4siteZ Plate3_PCO_6ch_4site_10XPA_Crestz 52.2
MolDev Scope1_MolDev_20X_4site Plate3_PCO_6ch_4site_20XPA_Crestz 43.3
MolDev Scope1_MolDev_20X_9site Plate2_PCO_6ch_9site_20XPA 56.7
MolDev Scope1_MolDev_20X_9site Plate3_PCO_6ch_9site_20XPA_Crest 50
MolDev Scope1_MolDev_20X_Adaptive Plate3_PCO_6ch_Adaptive_20XPA 17.8
Nikon Scope1_Nikon_10X BR00117060a10x 26.7
Nikon Scope1_Nikon_10X BR00117061a10x 39.8
Nikon Scope1_Nikon_10X BR00117062a10x 27.8
Nikon Scope1_Nikon_10X BR00117063b10x 33.7
Nikon Scope1_Nikon_20X BR00117061a 58.9
Nikon Scope1_Nikon_20X BR00117062a 43.3
Nikon Scope1_Nikon_20X BR00117063b 46.7
PE Scope1_PE_Bin1_Confocal_1Plane CP_Broad_Phenix_C_BIN1_1Plane_P1 50
PE Scope1_PE_Bin1_Confocal_1Plane CP_Broad_Phenix_C_BIN1_1Plane_P2 53.3
PE Scope1_PE_Bin1_Confocal_1Plane CP_Broad_Phenix_C_BIN1_1Plane_P3 51.1
PE Scope1_PE_Bin1_Confocal_1Plane CP_Broad_Phenix_C_BIN1_1Plane_P4 44.4
PE Scope1_PE_Bin1_Confocal_3Plane CP_Broad_Phenix_C_BIN1_P1 53.3
PE Scope1_PE_Bin1_Confocal_3Plane CP_Broad_Phenix_C_BIN1_P2 56.7
PE Scope1_PE_Bin1_Confocal_3Plane CP_Broad_Phenix_C_BIN1_P3 53.3
PE Scope1_PE_Bin1_Confocal_3Plane CP_Broad_Phenix_C_BIN1_P4 44.4
PE Scope1_PE_Bin1_Widefield_1Plane CP_Broad_Phenix_NC_BIN1_1Plane_P1 52.2
PE Scope1_PE_Bin1_Widefield_1Plane CP_Broad_Phenix_NC_BIN1_1Plane_P2 54.4
PE Scope1_PE_Bin1_Widefield_1Plane CP_Broad_Phenix_NC_BIN1_1Plane_P3 51.1
PE Scope1_PE_Bin1_Widefield_1Plane CP_Broad_Phenix_NC_BIN1_1Plane_P4 46.7
PE Scope1_PE_Bin1_Widefield_3Plane CP_Broad_Phenix_NC_BIN1_P1 53.3
PE Scope1_PE_Bin1_Widefield_3Plane CP_Broad_Phenix_NC_BIN1_P2 54.4
PE Scope1_PE_Bin1_Widefield_3Plane CP_Broad_Phenix_NC_BIN1_P3 47.8
PE Scope1_PE_Bin1_Widefield_3Plane CP_Broad_Phenix_NC_BIN1_P4 46.7
PE Scope1_PE_Bin2_Confocal_1Plane CPBroadPhenixC1PlaneP1 50
PE Scope1_PE_Bin2_Confocal_1Plane CPBroadPhenixC1PlaneP2 51.1
PE Scope1_PE_Bin2_Confocal_1Plane CPBroadPhenixC1PlaneP3 53.3
PE Scope1_PE_Bin2_Confocal_1Plane CPBroadPhenixC1PlaneP4 45.6
PE Scope1_PE_Bin2_Confocal_3Plane CPBroadPhenixCP1 53.3
PE Scope1_PE_Bin2_Confocal_3Plane CPBroadPhenixCP2 55.6
PE Scope1_PE_Bin2_Confocal_3Plane CPBroadPhenixCP3 56.7
PE Scope1_PE_Bin2_Confocal_3Plane CPBroadPhenixCP4 56.7
PE Scope1_PE_Bin2_Widefield_1Plane CPBroadPhenixNC1PlaneP1 53.3
PE Scope1_PE_Bin2_Widefield_1Plane CPBroadPhenixNC1PlaneP2 53.3
PE Scope1_PE_Bin2_Widefield_1Plane CPBroadPhenixNC1PlaneP3 47.8
PE Scope1_PE_Bin2_Widefield_1Plane CPBroadPhenixNC1PlaneP4 47.8
PE Scope1_PE_Bin2_Widefield_3Plane CPBroadPhenixNCP1 52.2
PE Scope1_PE_Bin2_Widefield_3Plane CPBroadPhenixNCP2 51.1
PE Scope1_PE_Bin2_Widefield_3Plane CPBroadPhenixNCP3 50
PE Scope1_PE_Bin2_Widefield_3Plane CPBroadPhenixNCP4 46.7
Yokogawa_Japan Scope1_Yokogawa_Japan_20X 20201021T092317 53.3
Yokogawa_Japan Scope1_Yokogawa_Japan_40X 20201020T134356 37.8
Yokogawa_US Scope1_Yokogawa_US_10X BRO0117014_10x 48.9
Yokogawa_US Scope1_Yokogawa_US_20X_5Ch BRO0117033_20xb 57.8
Yokogawa_US Scope1_Yokogawa_US_20X_5Ch BRO0117056_20x 50
Yokogawa_US Scope1_Yokogawa_US_20X_5Ch_12Z BRO0117056_20xb 55.6
Yokogawa_US Scope1_Yokogawa_US_20X_6Ch_BRO0117033 BRO0117033_20x 14.7
Yokogawa_US Scope1_Yokogawa_US_20X_6Ch_BRO0117059 BRO0117059_20X 57.8
Yokogawa_US Scope1_Yokogawa_US_20X_6Ch_BRO01177034 BRO01177034_20x 53.3
Yokogawa_US Scope1_Yokogawa_US_40X_BRO0117059 BRO0117059_40x 43.3
Distribution plots ![0 percent_replicating](https://user-images.githubusercontent.com/55710463/155234533-06c06e4c-2c64-4094-bfd6-91bd828854b1.png)

Percent matching

Scope_vendor Batch Plate_Name Percent_Matching
MolDev Scope1_MolDev_10X Plate2_PCO_6ch_4site_10XPA 18.605
MolDev Scope1_MolDev_10X Plate3_PCO_6ch_4site_10XPA_Crest 16.279
MolDev Scope1_MolDev_10X_4siteZ Plate3_PCO_6ch_4site_10XPA_Crestz 20.93
MolDev Scope1_MolDev_20X_4site Plate3_PCO_6ch_4site_20XPA_Crestz 16.279
MolDev Scope1_MolDev_20X_9site Plate2_PCO_6ch_9site_20XPA 18.605
MolDev Scope1_MolDev_20X_9site Plate3_PCO_6ch_9site_20XPA_Crest 13.953
MolDev Scope1_MolDev_20X_Adaptive Plate3_PCO_6ch_Adaptive_20XPA 6.977
Nikon Scope1_Nikon_10X BR00117060a10x 13.953
Nikon Scope1_Nikon_10X BR00117061a10x 14.634
Nikon Scope1_Nikon_10X BR00117062a10x 11.628
Nikon Scope1_Nikon_10X BR00117063b10x 11.905
Nikon Scope1_Nikon_20X BR00117061a 16.279
Nikon Scope1_Nikon_20X BR00117062a 16.279
Nikon Scope1_Nikon_20X BR00117063b 18.605
PE Scope1_PE_Bin1_Confocal_1Plane CP_Broad_Phenix_C_BIN1_1Plane_P1 16.279
PE Scope1_PE_Bin1_Confocal_1Plane CP_Broad_Phenix_C_BIN1_1Plane_P2 16.279
PE Scope1_PE_Bin1_Confocal_1Plane CP_Broad_Phenix_C_BIN1_1Plane_P3 18.605
PE Scope1_PE_Bin1_Confocal_1Plane CP_Broad_Phenix_C_BIN1_1Plane_P4 16.279
PE Scope1_PE_Bin1_Confocal_3Plane CP_Broad_Phenix_C_BIN1_P1 18.605
PE Scope1_PE_Bin1_Confocal_3Plane CP_Broad_Phenix_C_BIN1_P2 18.605
PE Scope1_PE_Bin1_Confocal_3Plane CP_Broad_Phenix_C_BIN1_P3 16.279
PE Scope1_PE_Bin1_Confocal_3Plane CP_Broad_Phenix_C_BIN1_P4 20.93
PE Scope1_PE_Bin1_Widefield_1Plane CP_Broad_Phenix_NC_BIN1_1Plane_P1 18.605
PE Scope1_PE_Bin1_Widefield_1Plane CP_Broad_Phenix_NC_BIN1_1Plane_P2 16.279
PE Scope1_PE_Bin1_Widefield_1Plane CP_Broad_Phenix_NC_BIN1_1Plane_P3 18.605
PE Scope1_PE_Bin1_Widefield_1Plane CP_Broad_Phenix_NC_BIN1_1Plane_P4 23.256
PE Scope1_PE_Bin1_Widefield_3Plane CP_Broad_Phenix_NC_BIN1_P1 16.279
PE Scope1_PE_Bin1_Widefield_3Plane CP_Broad_Phenix_NC_BIN1_P2 16.279
PE Scope1_PE_Bin1_Widefield_3Plane CP_Broad_Phenix_NC_BIN1_P3 18.605
PE Scope1_PE_Bin1_Widefield_3Plane CP_Broad_Phenix_NC_BIN1_P4 16.279
PE Scope1_PE_Bin2_Confocal_1Plane CPBroadPhenixC1PlaneP1 18.605
PE Scope1_PE_Bin2_Confocal_1Plane CPBroadPhenixC1PlaneP2 16.279
PE Scope1_PE_Bin2_Confocal_1Plane CPBroadPhenixC1PlaneP3 18.605
PE Scope1_PE_Bin2_Confocal_1Plane CPBroadPhenixC1PlaneP4 18.605
PE Scope1_PE_Bin2_Confocal_3Plane CPBroadPhenixCP1 20.93
PE Scope1_PE_Bin2_Confocal_3Plane CPBroadPhenixCP2 18.605
PE Scope1_PE_Bin2_Confocal_3Plane CPBroadPhenixCP3 18.605
PE Scope1_PE_Bin2_Confocal_3Plane CPBroadPhenixCP4 18.605
PE Scope1_PE_Bin2_Widefield_1Plane CPBroadPhenixNC1PlaneP1 18.605
PE Scope1_PE_Bin2_Widefield_1Plane CPBroadPhenixNC1PlaneP2 16.279
PE Scope1_PE_Bin2_Widefield_1Plane CPBroadPhenixNC1PlaneP3 16.279
PE Scope1_PE_Bin2_Widefield_1Plane CPBroadPhenixNC1PlaneP4 23.256
PE Scope1_PE_Bin2_Widefield_3Plane CPBroadPhenixNCP1 18.605
PE Scope1_PE_Bin2_Widefield_3Plane CPBroadPhenixNCP2 13.953
PE Scope1_PE_Bin2_Widefield_3Plane CPBroadPhenixNCP3 16.279
PE Scope1_PE_Bin2_Widefield_3Plane CPBroadPhenixNCP4 16.279
Yokogawa_Japan Scope1_Yokogawa_Japan_20X 20201021T092317 20.93
Yokogawa_Japan Scope1_Yokogawa_Japan_40X 20201020T134356 16.279
Yokogawa_US Scope1_Yokogawa_US_10X BRO0117014_10x 18.605
Yokogawa_US Scope1_Yokogawa_US_20X_5Ch BRO0117033_20xb 16.279
Yokogawa_US Scope1_Yokogawa_US_20X_5Ch BRO0117056_20x 18.605
Yokogawa_US Scope1_Yokogawa_US_20X_5Ch_12Z BRO0117056_20xb 23.256
Yokogawa_US Scope1_Yokogawa_US_20X_6Ch_BRO0117033 BRO0117033_20x 26.087
Yokogawa_US Scope1_Yokogawa_US_20X_6Ch_BRO0117059 BRO0117059_20X 20.93
Yokogawa_US Scope1_Yokogawa_US_20X_6Ch_BRO01177034 BRO01177034_20x 20.93
Yokogawa_US Scope1_Yokogawa_US_40X_BRO0117059 BRO0117059_40x 13.953
Distribution plots ![0 percent_matching](https://user-images.githubusercontent.com/55710463/155234488-6da0b397-f8a6-4dae-bb11-4965d1864fac.png)
bethac07 commented 2 years ago

OMG SO EXCITING

bethac07 commented 2 years ago

Those numbers are all... shockingly consistent, especially within the 20X.

Is there anything we can say about the two plates that seem to do really poorly, Scope1_MolDev_20X_Adaptiveand Scope1_Yokogawa_US_20X_6Ch_BRO0117033 ? QC issues, etc?

NasimJ commented 2 years ago

Scope1_Yokogawa_US_20X_6Ch_BRO0117033 is not a complete plate, the data was uploaded only up to well D21 and Scope1_MolDev_20X_Adaptive has variations for the number of sites per well (some wells with 2 sites and some with 3)

bethac07 commented 2 years ago

Those both make a lot of sense as to why we'd see lower numbers! I'd argue the partial plate shouldn't be kept at all.

NasimJ commented 2 years ago

I agree, and I wasn't planning to include them in the group analysis (i.e: analysis of magnifications and modality, etc.), but I thought having a plate level analysis would be good.

bethac07 commented 2 years ago

Absolutely, this is fantastic to have!

On Wed, Feb 9, 2022, 5:46 PM Nasim Jamali @.***> wrote:

I agree, and I wasn't planning to include them in the group analysis (i.e: analysis of magnifications and modality, etc.), but I thought having a plate level analysis would be good.

— Reply to this email directly, view it on GitHub https://github.com/jump-cellpainting/jump-scope-analysis/issues/1#issuecomment-1034275446, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTI727TZ6ALZOT44X42NXTU2LVFZANCNFSM5N6V5GOA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

AnneCarpenter commented 2 years ago

It's amazing to see SO much data in one place. What an awesome experiment! Tacos for @NasimJ !

The nice thing about the MOA plate is that unlike our JUMP-Target plates (which only have singlicates of most compounds), on JUMP-MOA we have multiple well locations for each compound.

Can you confirm whether Percent Matching is calculated as taking a given compound and checking all of its replicates for matching to all replicates of the "sister" (same-MOA) compound? I assume we calculate the average correlation values of all of those (even though within-plate is 'easier' to match than across-plate). I think this is a fine way to calculate it. It would be more stringent to only allow matching across plates not within-plate, but our goal here isn't studying plate to plate variation so I think that would be pointless.

For Percent Replicating, it's a more serious situation where plate layout effects can inflate Percent Replicating pretty badly. So here, I think there is a 'correct' answer for how to analyze: We should take each well of a compound and only measure correlation to other-well-positions of that compound (including within-plate AND across-plate all together is fine, for the same reasons as above, in my opinion). Is that how we've done it?

And then finally a Q about overall analysis design: should we be running a version where we attempt as much of an apples to apples comparison as we can? I.e. if one vendor took 15 images per well and another 3, we limit our analysis to 3 across all plates? We are not doing a bakeoff and yet if we publish the results from each vendor some may look artificially 'bad' and people might not get the nuance that it was a technical reason like number of sites that is the difference. I'm just wondering how you are thinking about these sorts of things, and I'm not sure which variables make sense to control for, beyond number of sites.

NasimJ commented 2 years ago

Can you confirm whether Percent Matching is calculated as taking a given compound and checking all of its replicates for matching to all replicates of the "sister" (same-MOA) compound? I assume we calculate the average correlation values of all of those (even though within-plate is 'easier' to match than across-plate). I think this is a fine way to calculate it. It would be more stringent to only allow matching across plates not within-plate, but our goal here isn't studying plate to plate variation so I think that would be pointless.

Exactly, our goal here is not to study the plate to plate variations, so these are calculated within a plate.

For Percent Replicating, it's a more serious situation where plate layout effects can inflate Percent Replicating pretty badly. So here, I think there is a 'correct' answer for how to analyze: We should take each well of a compound and only measure correlation to other-well-positions of that compound (including within-plate AND across-plate all together is fine, for the same reasons as above, in my opinion). Is that how we've done it?

For the Percent Replicating, I believe it is measuring the median of each compound replicates within a plate (only--and not across plates). When we want to compare across plates (within one batch of a scope vendor), we need to change the code to make sure the same well correlations are perhaps excluded. @niranjchandrasekaran please correct me if I'm missing something.

And then finally a Q about overall analysis design: should we be running a version where we attempt as much of an apples to apples comparison as we can? I.e. if one vendor took 15 images per well and another 3, we limit our analysis to 3 across all plates? We are not doing a bakeoff and yet if we publish the results from each vendor some may look artificially 'bad' and people might not get the nuance that it was a technical reason like number of sites that is the difference. I'm just wondering how you are thinking about these sorts of things, and I'm not sure which variables make sense to control for, beyond number of sites.

About the overall analysis, so far I only shared the overall per plate analysis. I’m planning to dig into each scope vendor separately and compare between different batches and plates within each batch (not comparing between vendors, but for example comparing different magnifications, binning, number of sites/field per well, etc. within a scope vendor plates—to make it more of an apple to apple comparison).

AnneCarpenter commented 2 years ago

Ah - When I made my suggestions, I wasn't thinking about the fact that it seems you generally only have a single plate per condition! So forgive the Qs that made no sense in that context.

"When we want to compare across plates (within one batch of a scope vendor)" --> I'm not sure you definitely even want to do this, do you? We want to compare metrics across vendors/conditions but I didn't think we wanted to check for ability to match signatures across plates.

Also note that profilers are fiercely debating how best to judge profiling experiments with these various control plates. This morning we settled for John's project (which uses Target2, so it's a different setup) to compute:

NasimJ commented 2 years ago

PE Percent Replicating vs. Percent Matching

Grouped by z_plane: 1 PE_percent_replicating_vs_percent_matching

AnneCarpenter commented 2 years ago

Wonderful to have all this data! I find this vis a bit hard to digest. I wonder if something like this will make comparisons easier across (except I drew as bar charts, should've done just the actual dots of data you have for each). IMG_1052

NasimJ commented 2 years ago

Nice drawing! :) What I've tried seems a bit crowded and hard to digest as well (even if I move the legend outside): PE_percent_replicating_vs_percent_matching_perPlate

Perhaps, the means and SD would be a better representation : 2 PE_Mean_percent_replicating_vs_percent_matching

AnneCarpenter commented 2 years ago

That is nicer to digest, though most journals do not want you to represent so few data points with a mean/SD, they want to see individual data points so ppl can see the raw data and be sure you're not hiding anything.

Going back to first principles, a visualization should answer a particular Q. In our case we could go high level:

So we could also go more granular:

bethac07 commented 2 years ago

Could we not do the mean/SD plot and then just add the points onto it? I like having it faceted the way that it currently is, personally.

AnneCarpenter commented 2 years ago

It's your call! My rationale is there. It really comes down to; "how long does it take a person to answer the Qs they want to answer". I find it hard to answer the Qs w these plots but it might be because they are all giving super similar (and/or within the range of noise) answers and thus the answers are not very clear no matter how we plot them.

NasimJ commented 2 years ago

This is my attempt for the overall representation of the PE data:

8 batches Variables:

Similarities:

Overall view: Overall_PE_percent_replicating_vs_percent_matching

and we could then separate them (based on different columns or rows): Scope1_PE_PercentReplicating_Matching_separateZ

Scope1_PE_PercentReplicating_Matching_separateZandbinning

I can try other methods, if this is not good.

NasimJ commented 2 years ago

Preliminary analysis for other batches:

For each vendor, similar to the PE group, I'm plotting an overall view of the data in one plot, then if there are more than one variable (magnification, binning, z_plane, modality) to compare between the batches of a specific vendor data, I plot them faceted, followed by a mean and standard deviation of the calculated percent replicating and matching of the plates for that vendor's dataset.

Feature selection for the profiles were done normal (at the batch level).

All analysis are done per plate (4 replicates per plate), and I’ve calculated the percent replicating and matching for these 4 replicates within each plate, and each data point shown in the overall plot (percent replicating vs matching) is for a given plate in the vendor's dataset. So, for the analysis, each plates is independent of the other plates.

The incomplete plates, one from Yokogawa_US (Scope1_Yokogawa_US_20X_6Ch_BRO0117033) and one from Molecular Devices (Scope1_MolDev_20X_Adaptive) are removed and not included in these plots.

Molecular Devices:

4 batches Variables:

Similarities:

Overall view:

Overall_Scope1_MolDev_PercentReplicating_Matching

Faceted view:

Scope1_MolDev_PercentReplicating_Matching_separate

MolDev_percent_replicating_vs_percent_matching_ImgperWell

Means and SD:

MolDev_Mean_percent_replicating_vs_percent_matching

Faceted Mean and SD:

MolDev_Mean_percent_replicating_vs_percent_matching_Separated


Nikon:

2 batches Variables:

Similarities:

Overall view: Overall_Nikon_percent_replicating_vs_percent_matching

Nikon has only two batches (one parameter to compare:magnification and all the other parameters are the same between these two batches), so there is no faceted plot for individual plates.

Faceted view: Scope1_Nikon_PercentReplicating_Matching_Imgperwell_faceted

Means and SD:

Nikon_Mean_percent_replicating_vs_percent_matching


Yokogawa_Japan:

2 batches Variables:

Similarities:

Overall_Yokogawa_Japan_percent_replicating_vs_percent_matching


Yokogawa_US:

6 batches Variables:

Similarities:

Overall view:

Overall_Yokogawa_US_percent_replicating_vs_percent_matching

Faceted view:

Scope1_Yoko_US_PercentReplicating_Matching_separateZandbinning

Yoko_US_percent_replicating_vs_percent_matching

Means and SD: The data points with no error bar means there is only one plate for that batch.

Yoko_US_Mean_percent_replicating_vs_percent_matching

bethac07 commented 2 years ago

Thanks, this is great! Obviously we'll need to eventually systematize/theme everything but great to just have all the comparisons so far.

Nikon has only two batches (one parameter to compare:magnification and all the other parameters are the same between these two batches), so there is no faceted plot for individual plates.

Sites per well is different, no? Also in the Yoko US and Japan? I think we need to be really clear on that, even where we aren't faceting on it, since it's one of the major hypotheses we intend to test as to WHY sometimes these values are different.

NasimJ commented 2 years ago

Sites per well is different, no?

Correct, but the magnification and the other variable (for Nikon sites per well, and for Yoko Japan sites per well and z_plane) are associated, so it would still be one representative data point on the percent matching vs replicating plot. For the Yoko US, I'll make an additional faceted plot for the sites per well.

bethac07 commented 2 years ago

Yeah, I understand we aren't going to facet on it, but would be nice to just have in the legend