Closed siboehm closed 2 years ago
Update on the 3 splits that I already had, where one drug is left out: Quisinostat (epigenetic), Flavopiridol (cell cycle regulation), and BMS-754807 (tyrosine kinase signaling).
Neither of these 3 exists in LINCS. However there are drugs in LINCS that are very similar, even though they don't match directly. I added a notebook to analyze these things more efficiently in #73.
Example for Quisinostat:
left is the Trapnell drug, right is LINCS. Tanimoto similarity is 1.0, but it's not the same molecule.
Ideally we'd leave out drugs in Trapnell that are very distant from the other Trapnell drugs. That'd should result in the pretrained score being much better than the non-pretrained score
@siboehm I was thinking about creating a notebook, that introduces the corresponding split to the trapnell datasets. What do you think? I can then also investigate on the comment I made in #73
Also, I like the above description! For 2., this seems quite involved. One option would be to leave out all drugs that we use for ood and the have only two lincs models in total, not two per ood drug.
Yes I agree. I think overall there is no need to have super many splits as we can integrate multiple experiments into a single split (for example instead of having 3 splits, where each split has one drug left out it wouldn't make a large difference to have a single split that leaves out all 3 drugs and we'd save time).
We have everything we need for performing this experiment once #81 is merged, Leon will write the YAML
Closing, not relevant anymore.
Summary
Test how much pretraining on LINCS helps with improving OOD drug prediction on Trapnell.
Why is this interesting?
Would allow accurate predictions of single-cell response to unseen drugs, without spending more money on the datasets.
Implementation (precise)
Ideal outcome
The pretrained models perform better than the non-pretrained model. The model that has seen the hold-out drugs on LINCS performs better than the pre-trained model that hasn't seen the drugs before.