snap-stanford / GEARS

GEARS is a geometric deep learning model that predicts outcomes of novel multi-gene perturbations
MIT License
189 stars 38 forks source link

error about prepare_split #64

Closed oahux28 closed 4 months ago

oahux28 commented 4 months ago

This work is excellent and I am trying to replicate your results. However, I met the error at the very begining. It will be very kind of you for helping me solve out this problem. The follow is my code and error: image image

yhr91 commented 4 months ago

Not sure why you see that error. I just ran this piece of code and everything seemed to work fine

import sys
sys.path.append('../')

from gears import PertData

pert_data = PertData('./data') # specific saved folder
pert_data.load(data_name = 'adamson') # specific dataset name
pert_data.prepare_split(split = 'simulation', seed = 1) # get data split with seed
pert_data.get_dataloader(batch_size = 32, test_batch_size = 128) # prepare data loader
oahux28 commented 4 months ago

Thanks for your kindness and patience, I think I found the reason. You may write the code for MacOS or Linux, but I run your code on a windows laptop. So, the output of the following code in your perdata.py line 166 data_path = os.path.join(self.data_path, data_name) is './data\norman' instead of './data/norman' thus, self.dataset_name = data_path.split('/')[-1] here is 'data\norman' instead of 'norman'.

I run this code pert_data.dataset_name = 'norman' before pert_data.prepare_split(split = 'simulation', seed = 1) pert_data.get_dataloader(batch_size = 32, test_batch_size = 128) and fix the problem. But I'm not sure whether there are other potential errors.

oahux28 commented 4 months ago

Also, I'm curious about the prediciton of GEARS. As you mentioned in issue#47, the output is [perturbation_categories, genes]. But in your code, correct me if i'm wrong, you add the new perturb on all unperturbed cells (self.ctrl_adata), then you take the average of all perturbed cells. Dose that means technically you could provide the single-cell resolution prediction? Could you please share with me why you give up the high resolution but use the average?

yhr91 commented 4 months ago

Yes, we haven't tested our code on Windows machines so there may be some unexpected behavior in how paths are defined.

Thanks for your question regarding single-cell prediction. We avoid making predictions at the single-cell level because the mapping between control cell and perturbed cells is random. We don't have ground truth information on how each cell responds to perturbation. Thus, both the metric computation and the model predictions are at the level of population averages.