snap-stanford / GEARS

GEARS is a geometric deep learning model that predicts outcomes of novel multi-gene perturbations
MIT License
189 stars 38 forks source link

Predictions: expression or delta expression? #30

Closed ekernf01 closed 11 months ago

ekernf01 commented 11 months ago

Hi Yusuf et al., I have a really simple question. Running a small example for 5 epochs, I notice about 20% of the predictions are negative, even though the training data are all nonnegative. Does GEARS predict expression directly, or additive change in expression over the control? Example below.

from gears import PertData, GEARS
pert_data = PertData('./data', default_pert_graph=False)
pert_data.load(data_name = "dixit")
pert_data.prepare_split(split = 'simulation', seed = 5) 
pert_data.get_dataloader(batch_size = 32, test_batch_size = 128) 
# set up and train a model
gears_model = GEARS(pert_data, device = 'cpu')
gears_model.model_initialize(hidden_size = 64)
gears_model.train(epochs = 5)

# predict
y = gears_model.predict([['USP13', 'USP15'], ['UTP6']])
(list(y.values())[0]>0).mean() # 0.82
yhr91 commented 11 months ago

Thanks yes this is a problem that we had not considered initially and I heard about recently. The final output is indeed expression but this is achieved through adding the delta expression onto control expression. It is possible that in some datasets this could result in a negative outcome. The fix is to clip values at 0 so they never go below that value. I can add that in the next pip update.

ekernf01 commented 11 months ago

Thank you for confirming!

I had actually assumed it was a fold change, which in hindsight was pretty boneheaded on my end. But I am getting much better performance now that I am using the predictions correctly. I'm excited to update all my experiments and see where things stand.