Closed skdbsxir closed 1 year ago
I believe it is a matter of memory capacity. We have not exactly tested out how much CPU memory you would need though. The saved data itself should be smaller than 100GB but it may require more CPU memory to process the data.
Thanks a lot! I thought it might be memory issue, your answer seems to have convinced me.
But is there any other way to process obgn-papers100M
by my self?
I'm using ogbn-products
now, but I want to use more large data for my experiment.
Increasing the CPU memory would be the best way; there is no easy workaround.
I'll find other way, thank you!
I think it might coused by multiprocess, when use copy.copy rather than copy.depp.copy(), there will be memory leak.
@skdbsxir I am also facing the same issue. I tried using the following link, by processing it on a machine with larger DRAM and saving it on a disk.
https://discuss.dgl.ai/t/paper100m-download-failed/3287
But Im facing OOM issue even after that.
Was curious if you figured out some other way? Also, is it possible to trim down the dataset ?
@UTKRISHTPATESARIA Sorry for my late response.
I tried various ways (also your given dgl link), but I couldn't process dataset.
So I just decided not using ogbn-papers100M, and decided using ogbn-products only :cry:
Thank you.
Hello.
I just finished downloading
ogbn-products100M
dataset bydataset = PygNodePropPredDataset(name='ogbn-papers100M')
, but there are some problems while processing files.I typed
dmesg | grep -E -i -B100 'killed process'
, and found it was OOM.I found other issues, and I found https://github.com/snap-stanford/ogb/issues/229 's answer. After deleting
processed
folder in dataset directory, I tried it withdataset = NodePropPredDataset(name = 'ogbn-papers100M')
. But I got same problem. (also it was OOM.)Below is my CPU info.
Below is my RAM info.
And below is my python packages info.
I also checked https://github.com/snap-stanford/ogb/issues/46 , but I have more than 100GB CPU memory as above. Does it matter with my memory capacity?