issues
search
microsoft
/
superbenchmark
A validation and profiling tool for AI infrastructure
https://aka.ms/superbench
MIT License
251
stars
55
forks
source link
V0.10.0 Release Plan
#559
Closed
cp5555
closed
8 months ago
cp5555
commented
1 year ago
Release Manager
@cp5555
Endgame
[x] Code freeze: Dec. 8th, 2023
[x] Bug Bash date: Dec. 11th, 2023
[x] Release date: Dec. 25th, 2023
Main Features
SuperBench Improvement
[x] Support Monitoring for AMD GPUs (#518 and #601)
[x] Support ROCm5.7 and ROCm6.0 dockerfile (#587, #598, and #602)
[x] Add MSCCL Support for Nvidia GPU (#584)
[x] Fix NUMA Domains Swap Issue in NDv4 Topology File (#592)
[x] Add ndv5 topo file (#597)
[x] Fix NCCL and NCCL-test to 2.18.3 for hang issue in CUDA12.2 (#599)
Micro-benchmark Improvement
[x] Add HPL random generator to gemm-flops with ROCm (#578)
[x] Add DirectXGPURenderFPS Benchmark to measure the FPS of rendering simple frames (#549)
[x] Add HWDecoderFPS Benchmark to measure the FPS of hardware decoder performance (#560)
[x] Update Docker image for H100 support (#577)
[x] Update MLC version into 3.10 for CUDA/ROCm dockerfile (#562)
[x] Bug fix for GPU Burn test (#567)
[x] Support INT8 in cublaslt function (#574)
[x] Add hipBLASLt function benchmark (#576)
[x] Support cpu-gpu and gpu-cpu in ib-validation (#581)
[x] Support graph mode in NCCL/RCCL benchmarks for latency metrics (#583)
[x] Support cpp implementation in distributed inference benchmark (#586 and #596)
[x] Add O2 option for gpu_copy ROCm build (#589)
[x] Support different hipblasLt data types in dist_inference (#590 and #603)
[x] Support in-place in NCCL/RCCL benchmark (#591)
[x] Support data type option in NCCL/RCCL benchmark (#595)
[x] Improve P2P performance with fine-grained GPU memory in GPU-copy test for AMD GPUs (#593)
[x] Update hipblaslt GEMM metric unit to tflops (#604)
[x] Support FP8 for hipblaslt benchmark (#605)
Model Benchmark Improvement
[x] Change torch.distributed.launch to torchrun (#556)
[x] Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582 and #600)
Result Analysis
[x] Support baseline generation from multiple nodes (#575)
Backlog
Micro-benchmark Improvement
Support cuDNN Backend API in cudnn-function.
Model Benchmark Improvement
Support VGG, LSTM, and GPT-2 small in TensorRT Inference Backend
Support VGG, LSTM, and GPT-2 small in ORT Inference Backend
Support more TensorRT parameters (Related to #366)
Release Manager
@cp5555
Endgame
Main Features
SuperBench Improvement
Micro-benchmark Improvement
Model Benchmark Improvement
Result Analysis
Backlog
Micro-benchmark Improvement
Model Benchmark Improvement