initzhang / DUCATI_SIGMOD

Accepted paper of SIGMOD 2023, DUCATI: A Dual-Cache Training System for Graph Neural Networks on Giant Graphs with the GPU
13 stars 2 forks source link

How to resolve 'TypeError: 'NoneType' object is not subscriptable'? #3

Closed BearBiscuit05 closed 11 months ago

BearBiscuit05 commented 12 months ago

Hello, following the instructions in the README, I successfully configured the environment. However, when I tried to run PA according to the commands, it seems that some issues have occurred. I would like to know where the problem is or if there might be an issue with the parameters I entered?

$CUDA_VISIBLE_DEVICES=0 python run_allocate.py --dataset ogbn-papers100M --fanouts 10,25 --fake-dim 128
2023-09-23 14:24:59,560 Namespace(adj_budget=0, adj_slope=1, batches=1000, bs=8000, dataset='ogbn-papers100M', fake_dim=128, fanouts='10,25', nfeat_budget=0, nfeat_slope=1, pre_batches=100, pre_epochs=2, runs=4, total_budget=1)
2023-09-23 14:25:00,598 loading raw dataset of ogbn-papers100M
2023-09-23 14:25:38,354 finish loading raw dataset, time elapsed: 37.76s
2023-09-23 14:26:09,762 finish preprocessing, time elapsed: 31.41s
2023-09-23 14:28:15,314 finish generating random features with dim=128, time elapsed: 124.61s
2023-09-23 14:28:16,243 Graph(num_nodes=111059956, num_edges=1615685872,
      ndata_schemes={}
      edata_schemes={})
2023-09-23 14:28:16,454 get 1000 seeds, 0.06GB on cuda:0
2023-09-23 14:28:16,455 start profiling and calculating slope
2023-09-23 14:28:42,581 finish calculating slope: adj(2.55) nfeat(13.45), time elapsed: 26.13s
2023-09-23 14:28:42,581 total cache budget: 1GB
2023-09-23 14:28:42,581 total adj size: 12.865GB, total nfeat size: 53.785GB
2023-09-23 14:28:44,675 finish constructing density and size array
2023-09-23 14:28:57,285 find the separate point 4565543
2023-09-23 14:28:57,325 nfeat entries: 1770741, adj entries: 2794802
2023-09-23 14:28:57,325 nfeat size: 0.858 GB, adj size: 0.142 GB
2023-09-23 14:28:57,684 dual cache allocation done, time_elapsed: 15.10s
2023-09-23 14:28:58,128 current allocation plan: 0.142GB adj cache & 0.858GB nfeat cache

Then

$CUDA_VISIBLE_DEVICES=0 python run_ducati.py

2023-09-23 14:42:06,607 Namespace(adj_budget=0, batches=1024, bs=8000, dataset='ogbn-papers100M', dropout=0.5, fake_dim=128, fanouts='10,25', lr=0.003, nfeat_budget=0, num_hidden=256, pre_batches=100, pre_epochs=2, runs=10)
2023-09-23 14:42:07,654 loading raw dataset of ogbn-papers100M
2023-09-23 14:42:45,337 finish loading raw dataset, time elapsed: 37.68s
2023-09-23 14:43:16,913 finish preprocessing, time elapsed: 31.58s
2023-09-23 14:45:22,367 finish generating random features with dim=128, time elapsed: 124.49s
2023-09-23 14:45:23,257 Graph(num_nodes=111059956, num_edges=1615685872,
      ndata_schemes={}
      edata_schemes={})
2023-09-23 14:45:23,485 get 1024 seeds, 0.06GB on cuda:0
gpu_flag None
gpu_map None
all_cache [None, None]
2023-09-23 14:45:23,882 buffer size: 0.185 GB
Traceback (most recent call last):
  File "run_ducati.py", line 109, in <module>
    entry(args, graph, all_data, seeds_list, counts)
  File "run_ducati.py", line 63, in entry
    run_one_list(seeds_list)
  File "run_ducati.py", line 49, in run_one_list
    cur_nfeat = nfeat_loader.load(input_nodes, nfeat_buf) # fetch nfeat
  File "/home/bear/workspace/DUCATI_SIGMOD/NfeatLoader.py", line 9, in load
    gpu_mask = self.gpu_flag[idx]
TypeError: 'NoneType' object is not subscriptable
initzhang commented 12 months ago

Hi, thank you for your interest in our work!

You need to provide the cache allocation plan and other params to the run_ducati.py script

CUDA_VISIBLE_DEVICES=0 python run_ducati.py --dataset ogbn-papers100M --fanouts 10,25 --fake-dim 128 --adj-budget 0.142 --nfeat-budget 0.858

BearBiscuit05 commented 11 months ago

Hi, with your help I successfully ran the results, thank you very much. After studying the source code, I would like to ask, did you obtain the results by replacing the original neighbor sampling function in dgl with the CSRRowWiseSamplingUniformWithCache function? Are there any other optimizations?

initzhang commented 11 months ago

Hi, there are two major differences between DUCATI and previous cache works. (1) propose the adj cache design and thus support sampling with cache as you mentioned (2) develop a dual-cache algorithm to solve the cache contention problem.

The first part is implemented in both DUCATI and customized DGL, the second part is implemented only in DUCATI’s code. If you simply substitute the neighbor sampling function, the code cannot work . You can find the details about the contributions in our paper.

BearBiscuit05 commented 11 months ago

Thank you very much for your response, I have understood.