Closed yukirora closed 8 months ago
[x] microbenchmark
[x] Bug fix for GPU Burn test (#567) [x] Support INT8 in cublaslt function (#574) [x] Support cpu-gpu and gpu-cpu in ib-validation (#581) [x] Support graph mode in NCCL/RCCL benchmarks for latency metrics (#583) [x] Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588) [x] dist-inference cpp (#586) [x] add msccl support (#584) [x] Support in-place for NCCL/RCCL benchmark (#591)
[x] Model Benchmark Improvement
[x] Change torch.distributed.launch to torchrun (#556) [x] Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582)
[x] Superbench improvement
[x] Update Docker image for H100 support (#577)
[x] microbenchmark improvement
[x] Add HPL random generator to gemm-flops with ROCm (#578) [x] Update MLC version into 3.10 for CUDA/ROCm dockerfile (#562) [x] Add hipBLASLt function benchmark (#576) [x] Support cpu-gpu and gpu-cpu in ib-validation (#581) [x] Support graph mode in NCCL/RCCL benchmarks for latency metrics (#583) [x] Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588) [x] dist-inference cpp (#586) [x] Support in-place for NCCL/RCCL benchmark (#591)
[x] Support Monitoring for AMD GPUs (#580)
[x] Support baseline generation from multiple nodes (#575)
Test Cases
single-node test
A100 and H100 related
[x] microbenchmark
[x] Model Benchmark Improvement
[x] Superbench improvement
MI200 and MI300x
[x] microbenchmark improvement
[x] Model Benchmark Improvement
[x] Superbench improvement
Result analysis