snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 397 forks source link

Isn't there any way to use ogb datasets by tensorflow, instead of torch? #251

Closed Matin-Macktoobian closed 2 years ago

Matin-Macktoobian commented 2 years ago

I am working on a big machine learning project in which various features of tensorflow are used. So, while using an ogb dataset for a new graph-based module, I cannot switch to torch. I thought your library-agnostic loader provides a way to incorporate tensorflow, as here one reads

We also prepare library-agnostic dataset loaders that can be used with any other deep learning libraries such as Tensorflow and MxNet.

However, I still get the following error

Traceback (most recent call last):
  File "C:/Users/Matinking/PycharmProjects/RL/GNN_spektral_OGB.py", line 11, in <module>
    from ogb.nodeproppred import NodePropPredDataset
  File "C:\Users\Matinking\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\__init__.py", line 2, in <module>
    from .dataset import NodePropPredDataset
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\dataset.py", line 5, in <module>
    from ogb.io.read_graph_raw import read_csv_graph_raw, read_csv_heterograph_raw,\
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\io\__init__.py", line 1, in <module>
    from .save_dataset import DatasetSaver
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\io\save_dataset.py", line 1, in <module>
    import torch
ModuleNotFoundError: No module named 'torch'

when I run, for example,

mport numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.optimizers import Adam

from spektral.data import DisjointLoader
from spektral.models import GeneralGNN

from ogb.nodeproppred import NodePropPredDataset

dataset = NodePropPredDataset("ogbn-proteins")
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx["train"], split_idx["valid"], split_idx["test"]

np.random.seed(0)

batch_size = 16
learning_rate = 0.0001
epochs = 100

loader_tr = DisjointLoader(train_idx, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(test_idx, batch_size=batch_size)

model = GeneralGNN(dataset.labels, activation="softmax")

optimizer = Adam(learning_rate)
loss_fn = CategoricalCrossentropy()
model.compile(loss=loss_fn,
              optimizer=optimizer,
              metrics=categorical_accuracy)

history = model.fit(loader_tr.load(), steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)

plt.plot(history.history['loss'])
plt.plot(history.history['categorical_accuracy'])
plt.xlabel('epoch')
plt.legend(["Loss", "Categorical Accuracy"])
plt.show()

Thus, can you please guide me on how to use ogb datasets by tensorflow, instead of torch?

Thanks, Matin

weihua916 commented 2 years ago

You need to install torch, but otherwise you can use ogb with tensorflow

Matin-Macktoobian commented 2 years ago

Thanks. I installed torch via pip, but I get the following file-driven error:

Using backend: pytorch
Traceback (most recent call last):
  File "~/PycharmProjects/RL/GNN_spektral_OGB.py", line 13, in <module>
    dataset = NodePropPredDataset("ogbn-proteins")
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\dataset.py", line 63, in __init__
    self.pre_process()
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\ogb\nodeproppred\dataset.py", line 70, in pre_process
    loaded_dict = torch.load(pre_processed_file_path)
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\serialization.py", line 608, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\serialization.py", line 777, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input
weihua916 commented 2 years ago

I see. Perhaps, you will need to delete the downloaded folder and download/preprocess from scratch.

Matin-Macktoobian commented 2 years ago

Can you please explain what downloaded folder you mean? If you mean that of torch, I did but the result is as faulty as I reported.

weihua916 commented 2 years ago

I meant dataset/ogbn_proteins. The dataset folder downloaded by ogb package.

Matin-Macktoobian commented 2 years ago

I just did, but the error persists to exist as before. Then, I tried to change the dataset checking whether it may be a problem of ogbn_proteins. This time, I used ogbg-molhiv for a graph level process. Then, I got the following error

Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1664, in <module>
    main()
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.5\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "~/PycharmProjects/RL/GNN_spektral_OGB.py", line 27, in <module>
    model = GeneralGNN(dataset.labels, activation="softmax")
  File "~AppData\Local\Programs\Python\Python37\lib\site-packages\spektral\models\general_gnn.py", line 158, in __init__
    activation,
  File "~\AppData\Local\Programs\Python\Python37\lib\site-packages\spektral\models\general_gnn.py", line 216, in __init__
    self.mlp.add(Dense(hidden if i < layers - 1 else output))
  File "~\AppData\Roaming\Python\Python37\site-packages\tensorflow\python\keras\layers\core.py", line 1166, in __init__
    self.units = int(units) if not isinstance(units, int) else units
TypeError: only size-1 arrays can be converted to Python scalars

raised by the code below.

import numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.optimizers import Adam

from spektral.data import DisjointLoader
from spektral.models import GeneralGNN

from ogb.graphproppred import GraphPropPredDataset

dataset = GraphPropPredDataset(name="ogbg-molhiv")

split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx["train"], split_idx["valid"], split_idx["test"]

np.random.seed(0)

batch_size = 16
learning_rate = 0.0001
epochs = 100

loader_tr = DisjointLoader(train_idx, batch_size=batch_size, epochs=epochs)
loader_te = DisjointLoader(test_idx, batch_size=batch_size)

model = GeneralGNN(dataset.labels, activation="softmax")

optimizer = Adam(learning_rate)
loss_fn = CategoricalCrossentropy()
model.compile(loss=loss_fn,
              optimizer=optimizer,
              metrics=categorical_accuracy)

history = model.fit(loader_tr.load(), steps_per_epoch=loader_te.steps_per_epoch, epochs=epochs)

plt.plot(history.history['loss'])
plt.plot(history.history['categorical_accuracy'])
plt.xlabel('epoch')
plt.legend(["Loss", "Categorical Accuracy"])
weihua916 commented 2 years ago

Interesting. I tried those datasets and it worked fine on my end. I was initially thinking that your dataset file was corrupted. I do not have clue..

>>> from ogb.nodeproppred import NodePropPredDataset
>>> dataset = NodePropPredDataset("ogbn-proteins")
Downloading http://snap.stanford.edu/ogb/data/nodeproppred/proteins.zip
Downloaded 0.21 GB: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 216/216 [00:02<00:00, 104.00it/s]
Extracting dataset/proteins.zip
Loading necessary files...
This might take a while.
Processing graphs...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00,  8.18s/it]
Saving...
>>> dataset
NodePropPredDataset(1)
>>> from ogb.graphproppred import GraphPropPredDataset
>>>
>>> dataset = GraphPropPredDataset(name="ogbg-molhiv")
Downloading http://snap.stanford.edu/ogb/data/graphproppred/csv_mol_download/hiv.zip
Downloaded 0.00 GB: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 87.36it/s]
Extracting dataset/hiv.zip
Loading necessary files...
This might take a while.
Processing graphs...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 41127/41127 [00:00<00:00, 49756.94it/s]
Saving...
>>> dataset
GraphPropPredDataset(41127)
weihua916 commented 2 years ago

Below is my environment:

torch==1.9.0
ogb==1.3.1
rusty1s commented 2 years ago

There is also a recent effort to adding OGB datasets to TensorFlow datasets, see here, but it's far from finished.

weihua916 commented 2 years ago

Closing this for now. Let us know if it still does not work!