issues
search
quiver-team
/
quiver-feature
High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph
Apache License 2.0
48
stars
5
forks
source link
TLB test results
#17
Closed
ZenoTan
closed
2 years ago
ZenoTan
commented
2 years ago
256GB feature
Warp-based: 13.3GB/s
Warp-based + sort: 20.2GB/s
Warp-based + sort + multi-kernel: 21.1GB/s
Block-based: 12.4GB/s
Block-based + sort: 17.9GB/s
Block-based + sort + multi-kernel: 22.9GB/s
Upper bound (cudamemcpy): 26.3GB/s
64GB feature
Warp-based: 19.9GB/s
Warp-based + sort: 22.7GB/s
Warp-based + sort + multi-kernel: 22.0GB/s
Block-based: 16.3GB/s
Block-based + sort: 19.4GB/s
Block-based + sort + multi-kernel: 22.2GB/s
16GB feature
Warp-based: 20.2GB/s
Warp-based + sort: 22.1GB/s
Warp-based + sort + multi-kernel: 22.4GB/s
Block-based: 16.8GB/s
Block-based + sort: 19.5GB/s
Block-based + sort + multi-kernel: 22.3GB/s
256GB feature
64GB feature
16GB feature