pytorch-labs / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
BSD 3-Clause "New" or "Revised" License
21 stars 3 forks source link

[GPU_UTILS] H100 specs #78

Closed antferdom closed 3 days ago

antferdom commented 3 days ago
Without sparsity NVIDIA Hopper performance specs: Technical Specifications H100 SXM H200 SXM
BFLOAT16 989.5 TFLOPS 989.5 TFLOPS
FP16 989.5 TFLOPS 989.5 TFLOPS
FP8 1979 TFLOPS 1979 TFLOPS
INT8 1979 TFLOPS 1979 TFLOPS
GPU Memory 80 GB 144 GB
GPU Memory Bandwidth 3.35 TB/s 4.8 TB/s

but this does not match tritonbench/utils/gpu_utils.py.NV_H100. Is this because this dictionary is meant to represent H100 NVL instead of SXM? Still, shouldn't we use the non-sparsity specs?. See xformers device limits.

xuzhao9 commented 3 days ago

I think this is because in our CI infra (uses GCP spot instances), the GPU is H100 PCI-E. cc @adamomainz should we use the same device limit as xformers?

antferdom commented 3 days ago

Thanks for clarifying @xuzhao9, I tend to always use the xformers approach.

adamomainz commented 3 days ago

I think this is because in our CI infra (uses GCP spot instances), the GPU is H100 PCI-E. cc @adamomainz should we use the same device limit as xformers?

+1 checked it and updated the reference recently see https://github.com/pytorch-labs/tritonbench/blob/66816daabd3647f256802100eec0ed0790eae409/tritonbench/utils/gpu_utils.py#L17

Happy to use the dame device limit as xformers too

adamomainz commented 3 days ago

I will update the gpu utils now but am hesitant to change it for the fp32 case in the way xformers describes since we do not assume the tf32 switch. see aten_matmul in gemm for example that is not using tf32 when setting precision to fp32.

adamomainz commented 3 days ago

@xuzhao9 please take a look at https://github.com/pytorch-labs/tritonbench/pull/80 which should close this issue

adamomainz commented 3 days ago

Discrepancy has been fixed in https://github.com/pytorch-labs/tritonbench/pull/80 please reopen if you still find an issue