microsoft / superbenchmark

A validation and profiling tool for AI infrastructure
https://aka.ms/superbench
MIT License
248 stars 55 forks source link

V0.10.0 Test Plan #585

Closed yukirora closed 8 months ago

yukirora commented 9 months ago

Test Cases

single-node test

Machine Type #Node #GPU GPU Type Accelerated Computing Toolkit Status
NDv5 SXM 1 8 H100 CUDA12.2 done
AMD MI200 1 16 AMD MI200 ROCM 5.7 done
AMD MI300x 1 8 AMD MI300x ROCM 6.0 done

A100 and H100 related

  1. [x] microbenchmark

    • [x] Bug fix for GPU Burn test (#567)
    • [x] Support INT8 in cublaslt function (#574)
    • [x] Support cpu-gpu and gpu-cpu in ib-validation (#581)
    • [x] Support graph mode in NCCL/RCCL benchmarks for latency metrics (#583)
    • [x] Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588)
    • [x] dist-inference cpp (#586)
    • [x] add msccl support (#584)
    • [x] Support in-place for NCCL/RCCL benchmark (#591)
  2. [x] Model Benchmark Improvement

    • [x] Change torch.distributed.launch to torchrun (#556)
    • [x] Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582)
  3. [x] Superbench improvement

    • [x] Update Docker image for H100 support (#577)

MI200 and MI300x

  1. [x] microbenchmark improvement

    • [x] Add HPL random generator to gemm-flops with ROCm (#578)
    • [x] Update MLC version into 3.10 for CUDA/ROCm dockerfile (#562)
    • [x] Add hipBLASLt function benchmark (#576)
    • [x] Support cpu-gpu and gpu-cpu in ib-validation (#581)
    • [x] Support graph mode in NCCL/RCCL benchmarks for latency metrics (#583)
    • [x] Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588)
    • [x] dist-inference cpp (#586)
    • [x] Support in-place for NCCL/RCCL benchmark (#591)
  2. [x] Model Benchmark Improvement

    • [x] Change torch.distributed.launch to torchrun (#556)
      • [x] Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582)
  3. [x] Superbench improvement

    • [x] Support Monitoring for AMD GPUs (#580)

Result analysis

  • [x] Support baseline generation from multiple nodes (#575)