pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.5k stars 3.69k forks source link

tudataset with temporal node label loading error #1481

Open yyou1996 opened 4 years ago

yyou1996 commented 4 years ago

🐛 Bug

When using tudataset class to load ones with temporal node label (e.g. tumblr_ct1 in https://chrsmrrs.github.io/datasets/docs/datasets/), it will return the error:

Traceback (most recent call last):
File "main.py", line 320, in <module>
run_exp_benchmark()
File "main.py", line 281, in run_exp_benchmark
dd_odeg10_ak1=True))
File "main.py", line 165, in run_exp_lib
dataset_name, sparse=True, feat_str=feat_str, root=args.data_root)
File "/home/tlchen/yuning_dir/meta-analysis/pre-training/datasets.py", line 59, in get_dataset
use_node_attr=True, processed_filename="data_%s.pt" % feat_str, aug=aug, aug_ratio=aug_ratio)
File "/home/tlchen/yuning_dir/meta-analysis/pre-training/tu_dataset.py", line 50, in __init__
pre_filter, use_node_attr)
File "/home/tlchen/anaconda3/envs/meta-analysis/lib/python3.6/site-packages/torch_geometric/datasets/tu_dataset.py", line 48, in __init__
pre_filter)
File "/home/tlchen/anaconda3/envs/meta-analysis/lib/python3.6/site-packages/torch_geometric/data/in_memory_dataset.py", line 56, in __init__
pre_filter)
File "/home/tlchen/anaconda3/envs/meta-analysis/lib/python3.6/site-packages/torch_geometric/data/dataset.py", line 83, in __init__ 
self._process()
File "/home/tlchen/anaconda3/envs/meta-analysis/lib/python3.6/site-packages/torch_geometric/data/dataset.py", line 117, in _process
self.process()
File "/home/tlchen/anaconda3/envs/meta-analysis/lib/python3.6/site-packages/torch_geometric/datasets/tu_dataset.py", line 88, in process
self.data, self.slices = read_tu_data(self.raw_dir, self.name)
File "/home/tlchen/anaconda3/envs/meta-analysis/lib/python3.6/site-packages/torch_geometric/read/tu.py", line 29, in read_tu_data
node_labels = read_file(folder, prefix, 'node_labels', torch.long)
File "/home/tlchen/anaconda3/envs/meta-analysis/lib/python3.6/site-packages/torch_geometric/read/tu.py", line 61, in read_file
return read_txt_array(path, sep=',', dtype=dtype)
File "/home/tlchen/anaconda3/envs/meta-analysis/lib/python3.6/site-packages/torch_geometric/read/txt_array.py", line 13, in read_txt_array
return parse_txt_array(src, sep, start, end, dtype, device)
File "/home/tlchen/anaconda3/envs/meta-analysis/lib/python3.6/site-packages/torch_geometric/read/txt_array.py", line 6, in parse_txt_array
src = torch.tensor(src, dtype=dtype).squeeze()
ValueError: expected sequence of length 2 at dim 1 (got 4)

To Reproduce

Steps to reproduce the behavior:

  1. Just using the api as:
    dataset = TUDatasetExt(path, name, pre_transform=pre_transform, use_node_attr=True, processed_filename="data_%s.pt" % feat_str, aug=aug, aug_ratio=aug_ratio)

Expected behavior

The error as above. Looks like a file reading error.

Environment

Additional context

rusty1s commented 4 years ago

I see. Thanks for reporting. It looks like temporal graphs currently can not be processed by our TUDataset wrapper since node labels may have an unequal number features, e.g.:

0, 0
0, 0, 20, 1
0, 0
0, 0
0, 0, 27, 1
0, 0, 6, 1
0, 0, 60, 1

@chrsmrrs Do you have any idea on how to represent this in PyTorch Geometric?

chrsmrrs commented 4 years ago

Unfortunately, the README for the temporal graphs is not documenting this sufficiently (We will improve this.). Here is an explanation:

  1. column: time step 0,
  2. column: label at time step 0 (i.e., all nodes are not infected)
  3. and 4. column: if nodes gets infected at time step t then 4. column is 1 (i.e., infected), otherwise blank

One solution is to replace blank which some dummy value, e.g. -1, and then preprocess the resulting matrix before feeding it into some GNN layer.