Closed antferdom closed 3 days ago
I think this is because in our CI infra (uses GCP spot instances), the GPU is H100 PCI-E. cc @adamomainz should we use the same device limit as xformers?
Thanks for clarifying @xuzhao9, I tend to always use the xformers approach.
I think this is because in our CI infra (uses GCP spot instances), the GPU is H100 PCI-E. cc @adamomainz should we use the same device limit as xformers?
+1 checked it and updated the reference recently see https://github.com/pytorch-labs/tritonbench/blob/66816daabd3647f256802100eec0ed0790eae409/tritonbench/utils/gpu_utils.py#L17
Happy to use the dame device limit as xformers too
I will update the gpu utils now but am hesitant to change it for the fp32 case in the way xformers describes since we do not assume the tf32 switch. see aten_matmul
in gemm
for example that is not using tf32 when setting precision to fp32.
@xuzhao9 please take a look at https://github.com/pytorch-labs/tritonbench/pull/80 which should close this issue
Discrepancy has been fixed in https://github.com/pytorch-labs/tritonbench/pull/80 please reopen if you still find an issue
but this does not match tritonbench/utils/gpu_utils.py.NV_H100. Is this because this dictionary is meant to represent H100 NVL instead of SXM? Still, shouldn't we use the non-sparsity specs?. See xformers device limits.