Closed sooheon closed 4 years ago
This does not occur with the dgl variant.
Interesting. I tested the following locally, and it worked fine. Could you clarify what you mean by "This does not occur with the dgl variant."?
from ogb.graphproppred import DglGraphPropPredDataset
d_name = 'ogbg-molchembl'
dataset = DglGraphPropPredDataset(name = d_name)
Ah I mean the Dgl variant works, and it's the pure python dataset which fails to save the pickle file. I think the further processing to Dgl datastructure reduces size of the pickle enough.
Edited root comment to reflect it's GraphPropPredDataset
that fails.
You are right, thanks for noticing this. I have resolved the issue in the master branch by using protocol = 4
in torch.save()
.
How to repro:
ds = GraphPropPredDataset('ogbg-molchembl', root='/tmp/ogb_datasets')
fails at thetorch.save
step of the pre_process method, because pickle "cannot serialize a string larger than 4 gb".What I've done:
I've tried setting torch.serialization.DEFAULT_PROTOCOL = 4 (which according to this adds support for large objects) before calling above, but this did not help -- I think it should be passed as arg to torch.save.