jump-cellpainting / datasets

Images and other data from the JUMP Cell Painting Consortium
BSD 3-Clause "New" or "Revised" License
149 stars 13 forks source link

Weird batch effects in source_1 #89

Open niranjchandrasekaran opened 6 months ago

niranjchandrasekaran commented 6 months ago

Alex Lu said

So - I'm doing an analysis of the CP JUMP data, and I noticed that in the CellProfiler feature space, the plates for Batch6_20221102 (Source_1) have a really weird variation structure. I took a look at some images, and looks like the contrast is just having really weirdly across images to the point that some have completely different color histograms and others aren't even recognizable as cells (left is a perturbation, right is the closest control; coloring scheme is arbitrary because I just threw this together rapidly).

Do you know what's going on with these plates? Should I just toss them all out?

image (1)

image (2)

Shantanu said

I looked up our notes and it appears that there are some weird plates in batch 6, but then that was also true for batch 5. I'm not sure why all of batch 6 is weird and why some of the bad plates were included; to be investigated.

I'd recommend just tossing those plates out for now.

Thank you for reporting this. Is it okay if we post this observation on GitHub so we can point others to it?

Alex said

Please do! For posterity, here's the analysis I caught this with - basically, I was looking at the cosine distance between each compound perturbation and its spatially nearest control in the CellProfiler feature space. The y-axis is the wells in order of their occurrence across plates/batches. You can see most wells have pretty constrained variation in their distances - and then you get to Batch 6 and it's just completely weird.

I can't see anything out-of-the-unusual with Batch 5, so it seems to be primarily localized to Batch 6, at least from my view?

image (3)

Shantanu said

Wow that’s pretty obvious!

Thanks a lot for reporting the details, Alex! This will make it’s way into GitHub soon