jump-cellpainting / datasets

Images and other data from the JUMP Cell Painting Consortium
BSD 3-Clause "New" or "Revised" License
155 stars 16 forks source link

Technical confounders in cpg0016 #90

Closed zhanggyuuuuu closed 3 months ago

zhanggyuuuuu commented 9 months ago

Hi all: I noticed that in the dataset related to ORF, which is the source_4 of cpg0016, over 7000 features and over 4000 features were analyzed, revealing batch effects between different batches. However, the article analyzed over 1400 features and obtained results without batch effects. In question #88, you mentioned updating the process of obtaining features. We are considering whether the process of obtaining these 1400 features is reliable, and whether the 1400 data can be used as reliable data for further analysis. Looking forward to your answer, thank u!

niranjchandrasekaran commented 9 months ago

Hi @zhanggyuuuuu, the 1400 features have strong batch and plate layout effects that mask the signal in the data. Treatments with strong signatures are visible in that feature set, but those with weaker signal are not visible until these effects are removed. We are currently in the process of making the new feature set available. I will share them here, once they are ready.

zhanggyuuuuu commented 8 months ago

Hi @zhanggyuuuuu, the 1400 features have strong batch and plate layout effects that mask the signal in the data. Treatments with strong signatures are visible in that feature set, but those with weaker signal are not visible until these effects are removed. We are currently in the process of making the new feature set available. I will share them here, once they are ready. 嗨,1400 特征具有很强的批次和板布局效果,可以掩盖数据中的信号。具有强特征的治疗在该特征集中可见,但在消除这些影响之前,信号较弱的治疗是不可见的。我们目前正在提供新功能集。一旦它们准备好了,我将在这里分享它们。

hi, thanks for your reply.I still have some questions. What method was used to remove batch effects from data containing 1400 features? Meanwhile, during our exploration of the data, we found that not only source 4 in cpg0016 has positional effects, but also some datasets have positional effects. Is this positional effect a limiting factor that needs to be removed, or is it a technical error that leads to some datasets exhibiting positional effects?

niranjchandrasekaran commented 5 months ago

Hi @zhanggyuuuuu, sorry for the delayed response.

Is this positional effect a limiting factor that needs to be removed, or is it a technical error that leads to some datasets exhibiting positional effects?

We don't observe positional effects in all datasets. But in those datasets where we observe positional effects, we try eo remove them. A version of corrected profiles is almost ready. Stay tuned!

niranjchandrasekaran commented 3 months ago

Hi @zhanggyuuuuu, here are the plate layout effect and batch effect corrected profiles for the ORF dataset: https://cellpainting-gallery.s3.amazonaws.com/cpg0016-jump-assembled/source_all/workspace/profiles/jump-profiling-recipe_2024_a917fa7/ORF/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony/profiles_wellpos_cc_var_mad_outlier_featselect_sphering_harmony.parquet