Closed chaojiewang94 closed 2 years ago
Second this, I just noticed the 3 atom types in PROTEINS
are only added when setting use_node_attr=True
in the constructor (but then we get an additional feature which was not there before). However this is not consistent with the behavior of other datasets like NCI1
, where atom type is always present. This change of behavior can seriously impact the reproducibility of libraries using this dataset. Please fix it asap.
from torch_geometric.datasets import TUDataset
dataset = TUDataset('/tmp/ENZYMES', name='ENZYMES')
print(dataset)
print(dataset.num_features)
returns 3
for me. Can you remove the processed
folder and try again?
It's a problem affecting specifically 'PROTEINS'
('ENZYMES'
is fine):
In [1]: from torch_geometric.datasets import TUDataset
In [2]: ds = TUDataset(root='/tmp/TUDataset/', name='PROTEINS')
Downloading https://www.chrsmrrs.com/graphkerneldatasets/PROTEINS.zip
Extracting /tmp/TUDataset/PROTEINS/PROTEINS.zip
Processing...
Done!
In [3]: ds.num_node_attributes
Out[3]: 43471
In [4]: ds = TUDataset(root='/tmp/TUDataset/', name='ENZYMES')
Downloading https://www.chrsmrrs.com/graphkerneldatasets/ENZYMES.zip
Extracting /tmp/TUDataset/ENZYMES/ENZYMES.zip
Processing...
Done!
In [5]: ds.num_node_attributes
Out[5]: 18
Ah, I see. Sorry, not sure why I tested on ENZYMES
. Your PR indeed fixes this, thanks!
This problem has appeared again in the latest version.
from torch_geometric.datasets import TUDataset
dataset = TUDataset(root="datasets", name="PROTEINS", use_node_attr=False)
print(dataset)
print(dataset.num_node_attributes)
print(dataset.num_node_labels)
print(dataset.num_node_features)
The outputs are:
PROTEINS(1113)
43471
3
0
where the right values shoule be:
dataset.num_node_attributes=1
dataset.num_node_labels=3
dataset.num_node_features=3
In additional, all data.x become wrong:
In [ ]: dataset[0].x
Out[ ]: tensor([], size=(42, 0))
I cannot reproduce this on latest version. Can you remove the processed_dir
and try again?
I'm sorry. This seems to be a local problem in my own environment, as I tried the same code in other's environment and there was no problem. But I didn't find the reason for this problem, maybe there are some packages in wrong version and have conflicts with pyg.
🐛 Describe the bug
The main reason is in line 136 of tu_dataset.py
it is strange that the value of num_edge_attributes is larger than the feature dimension of self.data.x in proteins, which leads to the resulting dimension of self.data.x is num_nodes*0
Environment
conda
,pip
, source):torch-scatter
):