initzhang / DUCATI_SIGMOD

Accepted paper of SIGMOD 2023, DUCATI: A Dual-Cache Training System for Graph Neural Networks on Giant Graphs with the GPU
13 stars 2 forks source link

When I run the preprocess/PA.py,I encounter the ValueError: NULL pointer access error #6

Closed lh123cha closed 9 months ago

lh123cha commented 9 months ago

I have already install the dc_dgl successfully.But I encounter this error:

nullptraccess

lh123cha commented 9 months ago

And I how can I get the coo.txt file like uk_coo.txt from gnnlab. I can only get samgraph/uk-2006-05/coo.bin and samgraph/twitter/coo.bin from gnnlab.

屏幕截图 2023-12-11 204532 How can I get the coo.txt file used in preprocess/UU_UK_TW.py

initzhang commented 9 months ago

Hi @lh123cha , thanks for your interest in our work!

I have already install the dc_dgl successfully.But I encounter this error:

It seems that the problem is in OGB loading, could you try the following code and check the results?

from ogb.nodeproppred import DglNodePropPredDataset
dataset = DglNodePropPredDataset(name='ogbn-papers100M')

And I how can I get the coo.txt file like uk_coo.txt from gnnlab. I can only get samgraph/uk-2006-05/coo.bin and samgraph/twitter/coo.bin from gnnlab.

You can modify the webgraph utility in gnnlab to (de)serialise the bin file to coo file. Specifically, you can change the line of code and print/save to stdout/file instead.

lh123cha commented 9 months ago

Thank you for the answer!But when I run run_allocate.py on uk dataset CUDA_VISIBLE_DEVICES=0 python run_allocate.py --dataset uk --fanouts 15,15,15 --fake-dim 100 --total-budget 1 The error occurs

屏幕截图 2023-12-13 121318

It seems like the np.polyfit() package failed on uk dataset.But I can run successfully on twitter dataset.Is there something wrong in uk dataset?

initzhang commented 9 months ago

I haven't encountered such problem before, but according to the error message, it seems that the program fails to fitting the curve due to large variance in the system running time? Maybe you can try the following:

(1) avoid sharing the GPU/Machine with other users, because the contention on GPU/PCIe leads to unstable system running time, which could cause the failure for np.polyfit(). (2) set pre_batches and pre_epochs to larger values (such as 1000 & 10), this can mitigate the instability of system running time.

lh123cha commented 9 months ago

Thank for the answer!I have sloved the problem.The problem is the total_buget is smaller than nfeat_buget that cause the nfeat_stats's length is one.And it is wrong to fitting the curve with only one point.So I set --total-buget to a larger number like 10.

initzhang commented 9 months ago

Indeed this is a corner case not covered by the current Allocator lol, I will consider add sanity check for this part maybe later. Thank you for reporting the case!

initzhang commented 9 months ago

Hi, I have updated the script to fix the problem and I will close this issue. Please feel free to reopen it or open a new issue If you have any further problems, thanks.