What's the issue, what's expected?:
I started superbenchmark on server with NVIDIA L40 and got error message "Unsupported architecture" from gemm-flops benchmark. L40 and L4 are CUDA-capable NVIDIA GPUs with 8.9 Compute Capability, as listed in https://developer.nvidia.com/cuda-gpus
How to reproduce it?:
sb run -f local.ini -c gemm-flops.yaml
where gemm-flops.yaml is default.yaml with enable: ['gemm-flops'] and proc_num: 1
I think compute capability 8.9 should be added to superbench/benchmarks/micro_benchmarks/cuda_gemm_flops_performance.py CudaGemmFlopsBenchmark __kernel_map similar to 8.6 (AD10x are similar to this group by having limited FP64 TFLOP rate). And there are two lists of ARCHS in third_party/Makefile for case CUDA Toolkit >= 11.8 with 86 and 90 which should be expanded by adding 89.
Thanks for capturing the issue, we have created a PR(#634) to support the 8.0 compute capability, please check if works for you and let us know if you have more questions!
What's the issue, what's expected?: I started superbenchmark on server with NVIDIA L40 and got error message "Unsupported architecture" from gemm-flops benchmark. L40 and L4 are CUDA-capable NVIDIA GPUs with 8.9 Compute Capability, as listed in https://developer.nvidia.com/cuda-gpus
How to reproduce it?:
sb run -f local.ini -c gemm-flops.yaml
where gemm-flops.yaml is default.yaml withenable: ['gemm-flops']
andproc_num: 1
Log message or shapshot?:
Additional information:
I think compute capability 8.9 should be added to superbench/benchmarks/micro_benchmarks/cuda_gemm_flops_performance.py CudaGemmFlopsBenchmark __kernel_map similar to 8.6 (AD10x are similar to this group by having limited FP64 TFLOP rate). And there are two lists of ARCHS in third_party/Makefile for case CUDA Toolkit >= 11.8 with 86 and 90 which should be expanded by adding 89.