snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
MIT License
1.89k stars 397 forks source link

Isn't there any way to use ogb datasets by tensorflow, instead of torch? #251

Closed Matin-Macktoobian closed 2 years ago

Matin-Macktoobian commented 2 years ago

I am working on a big machine learning project in which various features of tensorflow are used. So, while using an ogb dataset for a new graph-based module, I cannot switch to torch. I thought your library-agnostic loader provides a way to incorporate tensorflow, as here one reads

We also prepare library-agnostic dataset loaders that can be used with any other deep learning libraries such as Tensorflow and MxNet.

However, I still get the following error

Traceback (most recent call last):
  File "C:/Users/Matinking/PycharmProjects/RL/", line 11, in <module>
    from ogb.nodeproppred import NodePropPredDataset
  File "C:\Users\Matinking\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\", line 2, in <module>
    from .dataset import NodePropPredDataset
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\", line 5, in <module>
    from import read_csv_graph_raw, read_csv_heterograph_raw,\
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\io\", line 1, in <module>
    from .save_dataset import DatasetSaver
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\io\", line 1, in <module>
    import torch
ModuleNotFoundError: No module named 'torch'

when I run, for example,

mport numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.optimizers import Adam

from import DisjointLoader
from spektral.models import GeneralGNN

from ogb.nodeproppred import NodePropPredDataset

dataset = NodePropPredDataset("ogbn-proteins")
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx["train"], split_idx["valid"], split_idx["test"]


batch_size = 16
learning_rate = 0.0001
epochs = 100

loader_tr = DisjointLoader(train_idx, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(test_idx, batch_size=batch_size)

model = GeneralGNN(dataset.labels, activation="softmax")

optimizer = Adam(learning_rate)
loss_fn = CategoricalCrossentropy()

history =, steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)

plt.legend(["Loss", "Categorical Accuracy"])

Thus, can you please guide me on how to use ogb datasets by tensorflow, instead of torch?

Thanks, Matin

weihua916 commented 2 years ago

You need to install torch, but otherwise you can use ogb with tensorflow

Matin-Macktoobian commented 2 years ago

Thanks. I installed torch via pip, but I get the following file-driven error:

Using backend: pytorch
Traceback (most recent call last):
  File "~/PycharmProjects/RL/", line 13, in <module>
    dataset = NodePropPredDataset("ogbn-proteins")
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\", line 63, in __init__
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\", line 70, in pre_process
    loaded_dict = torch.load(pre_processed_file_path)
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\", line 608, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\", line 777, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input
weihua916 commented 2 years ago

I see. Perhaps, you will need to delete the downloaded folder and download/preprocess from scratch.

Matin-Macktoobian commented 2 years ago

Can you please explain what downloaded folder you mean? If you mean that of torch, I did but the result is as faulty as I reported.

weihua916 commented 2 years ago

I meant dataset/ogbn_proteins. The dataset folder downloaded by ogb package.

Matin-Macktoobian commented 2 years ago

I just did, but the error persists to exist as before. Then, I tried to change the dataset checking whether it may be a problem of ogbn_proteins. This time, I used ogbg-molhiv for a graph level process. Then, I got the following error

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\", line 1664, in <module>
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\", line 1658, in main
    globals =['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\_pydev_imps\", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "~/PycharmProjects/RL/", line 27, in <module>
    model = GeneralGNN(dataset.labels, activation="softmax")
  File "~AppData\Local\Programs\Python\Python37\lib\site-packages\spektral\models\", line 158, in __init__
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\spektral\models\", line 216, in __init__
    self.mlp.add(Dense(hidden if i < layers - 1 else output))
  File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\layers\", line 1166, in __init__
    self.units = int(units) if not isinstance(units, int) else units
TypeError: only size-1 arrays can be converted to Python scalars

raised by the code below.

import numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.optimizers import Adam

from import DisjointLoader
from spektral.models import GeneralGNN

from ogb.graphproppred import GraphPropPredDataset

dataset = GraphPropPredDataset(name="ogbg-molhiv")

split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx["train"], split_idx["valid"], split_idx["test"]


batch_size = 16
learning_rate = 0.0001
epochs = 100

loader_tr = DisjointLoader(train_idx, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(test_idx, batch_size=batch_size)

model = GeneralGNN(dataset.labels, activation="softmax")

optimizer = Adam(learning_rate)
loss_fn = CategoricalCrossentropy()

history =, steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)

plt.legend(["Loss", "Categorical Accuracy"])
weihua916 commented 2 years ago

Interesting. I tried those datasets and it worked fine on my end. I was initially thinking that your dataset file was corrupted. I do not have clue..

>>> from ogb.nodeproppred import NodePropPredDataset
>>> dataset = NodePropPredDataset("ogbn-proteins")
Downloaded 0.21 GB: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 216/216 [00:02<00:00, 104.00it/s]
Extracting dataset/
Loading necessary files...
This might take a while.
Processing graphs...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00,  8.18s/it]
>>> dataset
>>> from ogb.graphproppred import GraphPropPredDataset
>>> dataset = GraphPropPredDataset(name="ogbg-molhiv")
Downloaded 0.00 GB: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 87.36it/s]
Extracting dataset/
Loading necessary files...
This might take a while.
Processing graphs...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 41127/41127 [00:00<00:00, 49756.94it/s]
>>> dataset
weihua916 commented 2 years ago

Below is my environment:

rusty1s commented 2 years ago

There is also a recent effort to adding OGB datasets to TensorFlow datasets, see here, but it's far from finished.

weihua916 commented 2 years ago

Closing this for now. Let us know if it still does not work!