snap-stanford / GEARS

GEARS is a geometric deep learning model that predicts outcomes of novel multi-gene perturbations
MIT License
188 stars 40 forks source link

Unpickling Error in gene_set_path #76

Open bioinfonewguy opened 1 month ago

bioinfonewguy commented 1 month ago

Hello,

Thanks for your help on my previous query.

I'm currently working with the replogle_rpe1_essential dataset, and I have added a few unseen genes just to assess how well the model performs with them. My CSV contains all of the genes in the replogle dataset and the new genes I am interested in.

However, I am getting the following error:

`from gears import PertData, GEARS

Initialize PertData with your custom gene list

pert_data = PertData('./data', gene_set_path='/path/to/gene_list.csv')

Load the dataset

pert_data.load(data_name='replogle_rpe1_essential')

Prepare the data split

pert_data.prepare_split(split='simulation', seed=1)

Create dataloaders

pert_data.get_dataloader(batch_size=32, test_batch_size=128)

Initialize the GEARS model

gears_model = GEARS(pert_data, device='cpu') # Use 'cpu' for Mac, or 'cuda' if you have a compatible GPU

Initialize the model architecture

gears_model.model_initialize(hidden_size=64, uncertainty=True) Found local copy... Found local copy... Traceback (most recent call last):

Cell In[2], line 7 pert_data.load(data_name='replogle_rpe1_essential')

File /opt/anaconda3/envs/pyg_env/lib/python3.9/site-packages/gears/pertdata.py:183 in load self.set_pert_genes()

File /opt/anaconda3/envs/pyg_env/lib/python3.9/site-packages/gears/pertdata.py:109 in set_pert_genes essential_genes = pickle.load(f)

UnpicklingError: invalid load key, 'A'.`

jackbrougher commented 1 month ago

Not 100% positive because of the formatting above, but I believe I'm hitting the same issue.

I tried changing the set_pert_genes to read in a dataframe, but no luck. I'm thinking the error is here, but unable to pin down how to change it

def set_pert_genes(self):
        """
        Set the list of genes that can be perturbed and are to be included in 
        perturbation graph
        """

        if self.gene_set_path is not None:
            # If gene set specified for perturbation graph, use that
            path_ = self.gene_set_path
            self.default_pert_graph = False
            with open(path_, 'rb') as f:
                essential_genes = pickle.load(f)