Closed LuozyCS closed 1 year ago
Hi, have you checked the environement packages are all consistent with the requirement.txt? And, did you use your own preprocessed ogb data or the data downloaded by the codes. The error seems that the version of the loaded ogb datasets is inconsistent with the environment where the code is running
I ran the code in anaconda and I use conda list
checked that the environment packages are all consistent with the requirement.txt. But the terminal remind me that :
WARNING:root:The OGB package is out of date. Your version is 1.3.1, while the latest version is 1.3.6.
Is this what you mean ’the version of the loaded ogb datasets is inconsistent with the environment where the code is running‘?
I use the dataset downloaded by the codes.
I see, have you tried removed the preprocessing files under the files of the ogb dataset and run the code again? It could be caused by the inconsistent version of the preprocessed fileds.
Also, you can check the correctness of the dataset path '../data//ogb/ogbn_proteins/raw/nan.csv.gz'. It seems there is a redundant '/'
Thank you for your quick response.
you mean only leave the files in raw folder? I tried remove files in other folders except 'raw', but the same error still occurs below. Btw, I'm currently using the fix method aforementioned at the beginning: 'changing the 'float' object into 'str' in NodePropPredDataset functiuon' , and I still don't know if that's the right thing to do.
About the redundant '/'
WARNING:root:The OGB package is out of date. Your version is 1.3.1, while the latest version is 1.3.6.
Namespace(K=5, M=50, batch_size=10000, cached=False, cpu=False, data_dir='../data/', dataset='ogbn-proteins', device=1, directed=False, dropout=0.0, epochs=1000, eval_step=9, gat_heads=8, gpr_alpha=0.1, hidden_channels=64, hops=1, jk_type='max', knn_num=5, label_num_per_class=20, lamda=0.1, lp_alpha=0.1, lr=0.01, method='nodeformer', metric='rocauc', model_dir='../model/', num_heads=1, num_layers=3, num_mlp_layers=1, out_heads=1, projection_matrix_type=True, protocol='semi', rand_split=False, rand_split_class=False, rb_order=1, rb_trans='identity', runs=5, save_model=False, seed=42, sub_dataset='', tau=0.25, train_prop=0.5, use_act=True, use_bn=True, use_gumbel=True, use_jk=True, use_residual=True, valid_prop=0.25, weight_decay=0.0)
Downloading http://snap.stanford.edu/ogb/data/nodeproppred/proteins.zip
Downloaded 0.21 GB: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 216/216 [00:24<00:00, 8.66it/s]
Extracting ../data//ogb/proteins.zip
Loading necessary files...
This might take a while.
Traceback (most recent call last):
File "main-batch.py", line 43, in <module>
dataset = load_dataset(args.data_dir, args.dataset, args.sub_dataset)
File "/home/workspace/NF/NodeFormer/dataset.py", line 98, in load_dataset
dataset = load_proteins_dataset(data_dir)
File "/home/workspace/NF/NodeFormer/dataset.py", line 268, in load_proteins_dataset
ogb_dataset = NodePropPredDataset(name='ogbn-proteins', root=f'{data_dir}/ogb')
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 63, in __init__
self.pre_process()
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 135, in pre_process
self.graph = read_csv_graph_raw(raw_dir, add_inverse_edge = add_inverse_edge, additional_node_files = additional_node_files, additional_edge_files = additional_edge_files)[0] # only a single graph
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/io/read_graph_raw.py", line 83, in read_csv_graph_raw
temp = pd.read_csv(osp.join(raw_dir, additional_file + '.csv.gz'), compression='gzip', header = None).values
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 577, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1407, in __init__
self._engine = self._make_engine(f, self.engine)
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1661, in _make_engine
self.handles = get_handle(
File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/common.py", line 753, in get_handle
handle = gzip.GzipFile( # type: ignore[assignment]
File "/root/anaconda3/envs/nodeformer/lib/python3.8/gzip.py", line 173, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '../data//ogb/ogbn_proteins/raw/nan.csv.gz'
The dataset path is correct for small graph dataset, and I didn't change the floder's path while running the large graph script, cuz the whole process is done automatically.
Do you have any idea about the file named 'nan.csv.gz' ? I can't find it in the dataset I downloaded.
Hi, sorry for the late response due to the paper submission ddl. I think there might be some issues in your ogb package version. I just checked the files under my data file folder and it does not contain nan.csv.gz
./ogb/ogbn_proteins/raw
./ogb/ogbn_proteins/processed
Thank you for your suggestion. I will further find out what causes bug and get back with an update comment once I've resolved the issue in a few days.
Good luck for your new paper!
Thank you and hopefully you can find the bug soon.
I tried to setup the enviroment again with ogb1.3.1
and I successfully ran your code on ogbn-proteins and amazon2m. It's weird that last time I time I setup on a WSL environment but failed with ogb1.3.1
, so this time I tried on a linux one, not sure whether it's the reason. Anyway, thanks to your suggestions.
Btw, I meet a few troubles when I setup again. And I have a suggestion that if someone is in trouble with the installation of torch geometric/sparse/scatter, you can try installing in order torch_scatter==2.0.7
,torch_sparse==0.6.10
,torch_geometric==1.7.2
.
Again, thank you for your help!
Glad to hear that you resolved the issue! Indeed, the installation of the PyG package is dependent on the torch_scatter and torch_sparse. And, the versions of these packages should stay strictly consistent, otherwise there could be some wired bugs.
I can run your code correctly on small dataset by using scripts in
run.sh
and get similar results within the paper but when I'm trying to reproduce nodeformer on large graph dataset, it comes out an error on both amazon2m and ogb proteins dataset.This error occurs in
dataset.py
while calling the NodePropPredDataset functiuon : https://github.com/qitianwu/NodeFormer/blob/64d26581f571340ab750ce6e60a8bb524e22e726/dataset.py#L290 https://github.com/qitianwu/NodeFormer/blob/64d26581f571340ab750ce6e60a8bb524e22e726/dataset.py#L306I tried to fix this error and ran into the implementation of NodePropPredDataset by changing the 'float' object into 'str':
It passed, but another error comes out :
I don't know what‘s happening...... If you need more information, please let me know.
System Info