jump-cellpainting / datasets

Images and other data from the JUMP Cell Painting Consortium
BSD 3-Clause "New" or "Revised" License
162 stars 17 forks source link

Was any QC performed on raw images? #133

Open shntnu opened 1 month ago

shntnu commented 1 month ago

From Maxime Sanchez

Did you perform any quality controls on raw images before using them? I noticed that on target2 plates, for the same perturbation, some images looked good while others showed almost nothing (like the ones in the attachments). From my observations, this seems to be linked to toxicity and differences between labs.

shntnu commented 1 month ago

We had addressed this in https://www.nature.com/articles/s41467-024-50613-5 but for some reason it is missing in the text; our response to the reviewers below clarifies


Although there is a desire for a more quantitative, automated, and uniform strategy for quality control, such a method has not been adopted for the field of image-based profiling, nor high-content screening over its 25 year history (Shockley et al. 2019). We therefore did not set out to solve this problem. Instead, we relied on two strategies:

The first is now described in Methods: Dataset description:

The QC performed by each data provider is described in the Cell Painting protocol paper (Cimini et al. 2023) plates (or batches) that were obviously poor-quality were not included in the dataset. That said, the process is not detailed - it was qualitative and variable across partners but follows their best practices for all image-based screens (and many practices that are applied to all high-throughput screens) such as those in (Bray and Carpenter 2017) and other articles of the Assay Guidance Manual. Examples include monitoring the number of cells imaged per well or the total intensity of each channel and observing unusual patterns.

The second is added to an updated Discussion:

In downstream tasks well-level representations are not considered in isolation; instead, we aggregate the five replicates of the same perturbation, with each replicate in a different batch, by computing the median profile. This way an outlier batch is less prone to contaminate results.