theislab / chemCPA

Code for "Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution", NeurIPS 2022.
https://arxiv.org/abs/2204.13545
MIT License
97 stars 24 forks source link

entanglement score #139

Closed bhomass closed 11 months ago

bhomass commented 1 year ago

I read the paragraph about entanglement evaluation over and over. I can't quite figure out what you are saying.

I understand you created an MLP to predict the classification of drug and cell line from the basal state, and you would expect to get lot's of wrong predictions. This statement "An optimally disentangled model achieves scores that match the ratios of the most abundant drug and cell line, respectively" I assume means the basal state leads to predictions which favors the dominant drug and cell line. But what exactly do you mean by "the thresholds for perturbation and cell line disentanglement to < 10% and < 70%, respectively, while values of 3% and 51% are optimal" perturbation threshold of 10% means only 10% of the cases are predicted correctly? How is this relevant to the most abundant drug? and isn't 70% accuracy for cell line kind of high? means disentanglement for cell line is very poor?

MxMstrmn commented 1 year ago

Hi Bruce,

I assume means the basal state leads to predictions which favors the dominant drug and cell line.

Yes, exactly.

From the paper:

the thresholds for perturbation and cell line disentanglement to < 10% and < 70%, respectively, while values of 3% and 51% are optimal

This means that the accepted accuracy on the drug perturbations was 10% or lower. For the cell Ines, we accepted an accuracy of 70% and lower while the most abundant cell line made up 51% of the observation. If the chosen thresholds are high lies is somewhat based on your personal judgment. We chose them in a way that we were able to compare enough models, say 3 from 10 runs passed those thresholds. If you are more restrictive you need better initialisations of the adversarial classifiers / longer training. Certainly, this is an are which could be analysed further.