Open JonasDeSchouwer opened 3 months ago
To reproduce this issue:
In the terminal:
conda create -n test_save_env
conda activate test_save_env
conda install python=3.12
pip install ogb==1.3.6
Note that ogb has torch as a dependency, so in my case it installs torch 2.3.1. But I observed the same behaviour with torch 2.2.2+cu121.
Then run the following Python code:
from ogb.io.read_graph_raw import read_csv_graph_raw
import pandas as pd
import os.path as osp
import torch
raw_dir = "datasets/data/ogb/ogbn_proteins/raw"
graph = read_csv_graph_raw(raw_dir, add_inverse_edge=True, additional_node_files=['node_species'], additional_edge_files=[])[0]
labels = pd.read_csv(osp.join(raw_dir, 'node-label.csv.gz'), compression='gzip', header=None).values
In my case, this gives the following error (in a notebook):
The Kernel crashed while executing code in the current cell or a previous cell.
Please review the code in the cell(s) to identify a possible cause of the failure.
Click [here](https://aka.ms/vscodeJupyterKernelCrash) for more info.
View Jupyter [log](command:jupyter.viewOutput) for further details.
I try to execute the following line:
This starts off doing what it is supposed to:
proteins.zip
from the correct urldatasets/data/ogb/ogbn_proteins
with subdirectoriesmapping
,raw
,processed
,split
However, as soon as it gets to the line
in
ogb/nodeproppred/dataset.py
(= line 135 in the version I am running), the program crashes without any error messages, and only an empty file is saved todatasets/data/ogb/ogbn_proteins/processed/data_processed
.I have been able to reproduce this by just loading
self.graph
andself.labels
in a notebook by executing the following code:Then, I can save
labels
andgraph["node_species"]
to a file without problem, but as soon as I try to save anything containinggraph["edge_index"]
orgraph["edge_feat"]
to a file, the kernel crashes. Note that these have large sizes: (2, 79122504) forgraph["edge_index"]
and (79122504, 8) forgraph["edge_feat"]
. All matrices look pretty normal to me, so my guess is that this is a problem withtorch.save
not being able to handle large files (yet the matrices are smaller than the max size reported in this issue). Yet I thought it will be useful to let you know this and perhaps find a workaround.--- DETAILS ABOUT MY ENVIRONMENT ---
Output from conda: