jeongminpark417 / GIDS

26 stars 7 forks source link

[perf] result discussion on gen5 TLC and any performance tune suggestion? #26

Open gaowayne opened 3 weeks ago

gaowayne commented 3 weeks ago

@jeongminpark417 could you please help shed some light if there is any tune method?

I run the block bench with 100% read, if page size is 4K, I can get 12GB/s BW, if page size is 8K BaM can get 13.8GB/s below.

if I run GIDS full package benchmark, for SSD effective BW with page size 4K, I got 11.8GB/s

I suppose if page size is 8K, we can get more BW from SSD. but our GIDS crashed. or do you think CL is 8K is good idea to end to end result?

@jeongminpark417 in below result, there are two cache ratio dump, two SSD accesses dump, how to understand them what they are for different phase?

I calc SSD read BW with last data, does this make sense? print ctrl reset 0: ------------------------------------

SSDAccesses: 13800877

Kernel Time: 4477.36 Total Access: 14616789

why READ IO: are always 0?

root@salab-hpedl380g11-01:~/wayne/gids/debug/GIDS/bam/build# ./bin/nvm-block-bench --threads=262144 --blk_size=64 --reqs=10 --pages=262144 --queue_depth=1024  --page_size=8192 --num_blks=2097152 --gpu=0 --n_ctrls=1 --num_queues=128 --random=true --access_type=0
SQs: 255        CQs: 255        n_qps: 128
n_ranges_bits: 6
n_ranges_mask: 63
pages_dma: 0x7efeba010000       220020410000
HEREN
Finish Making Page Cache
finished creating cache
0000:8A:00.0
atlaunch kernel
Elapsed Time: 1.45283e+06       Number of Ops: 2621440  Data Size (bytes): 21474836480
Ops/sec: 1.80437e+06    Effective Bandwidth(GB/S): 13.7663
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# ./test1.sh
GIDS DataLoader Setting
GIDS:  True
CPU Feature Buffer:  True
Window Buffering:  True
Storage Access Accumulator:  True
Dataset: IGB
Setting the graph structure
dgl.heterograph init starting
{'created': ['csc'], 'not created': []}
dgl.heterograph init successful
SSD are not assigned
ssd list:  None
SSD index: 0
SQs: 255        CQs: 255        n_qps: 128
Ctrl sizes: 1
n pages: 1048576
page size: 4096
num elements: 563200000000
n_ranges_bits: 6
n_ranges_mask: 63
pages_dma: 0x7feff6010000       220020410000
HEREN
Cond1
100000 8 1 100000
Finish Making Page Cache
Number of required storage accesses:  854.0499999999993
  0%|                                                                                                                                                                | 0/1 [00:00<?, ?it/s]
warp up done
GIDS time:  51.127482891082764
WB time:  0.10593509674072266
print stats: 
print array reset: #READ IOs: 0 #Accesses:8162537120    #Misses:7910445216      Miss Rate:0.969116      #Hits: 252091904        Hit Rate:0.030884       CLSize:4096     Debug Cnt: 0
*********************************

print ctrl reset 0: ------------------------------------
#SSDAccesses:   247201413

Kernel Time:     44626.3
Total Access:    145861516
Performance for 100 iteration after 1000 iteration
GIDS time:  5.08076548576355
WB time:  0.010187864303588867
print stats: 
print array reset: #READ IOs: 0 #Accesses:466964480     #Misses:441628064       Miss Rate:0.945742      #Hits: 25336416 Hit Rate:0.0542577      CLSize:4096     Debug Cnt: 0
*********************************

print ctrl reset 0: ------------------------------------
#SSDAccesses:   13800877

Kernel Time:     4477.36
Total Access:    14616789
transfer time:  0.10730791091918945
train time:  0.7767562866210938
e2e time:  5.9761741161346436
  0%|                                                                                                                                                                | 0/1 [01:12<?, ?it/s]
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation#
gaowayne commented 2 weeks ago

@jeongminpark417 buddy, could you please add your comment for this? :)

jeongminpark417 commented 2 weeks ago

Currently, we only support the page size that matches the feature size for the dataset. Thus, if you want to test 8K page size, you can change the dimension of the feature data to 2048 instead of 1024 (but the accuracy will not be correct).

The readIO counter is removed in GIDS code, but you can calculate it with the SSD accesses and total number of accesses