V0.10.0 Test Plan - Githubissues

Test Cases

Machine Type	#Node #GPU GPU Type	Accelerated Computing Toolkit	Status
NDv5 SXM	1 8 H100	CUDA12.2	done
AMD MI200	1 16 AMD MI200	ROCM 5.7	done
AMD MI300x	1 8 AMD MI300x	ROCM 6.0	done

[x] microbenchmark
- [x] Bug fix for GPU Burn test (#567)
- [x] Support INT8 in cublaslt function (#574)
- [x] Support cpu-gpu and gpu-cpu in ib-validation (#581)
- [x] Support graph mode in NCCL/RCCL benchmarks for latency metrics (#583)
- [x] Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588)
- [x] dist-inference cpp (#586)
- [x] add msccl support (#584)
- [x] Support in-place for NCCL/RCCL benchmark (#591)
[x] Model Benchmark Improvement
- [x] Change torch.distributed.launch to torchrun (#556)
- [x] Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582)
[x] Superbench improvement
- [x] Update Docker image for H100 support (#577)

[x] microbenchmark improvement
- [x] Add HPL random generator to gemm-flops with ROCm (#578)
- [x] Update MLC version into 3.10 for CUDA/ROCm dockerfile (#562)
- [x] Add hipBLASLt function benchmark (#576)
- [x] Support cpu-gpu and gpu-cpu in ib-validation (#581)
- [x] Support graph mode in NCCL/RCCL benchmarks for latency metrics (#583)
- [x] Benchmarks: Micro benchmark - Add one-to-all, all-to-one, all-to-all support to gpu_copy_bw_performance (#588)
- [x] dist-inference cpp (#586)
- [x] Support in-place for NCCL/RCCL benchmark (#591)
[x] Model Benchmark Improvement
- [x] Change torch.distributed.launch to torchrun (#556)
  - [x] Support Megatron-LM/Megatron-Deepspeed GPT pretrain benchmark (#582)
[x] Superbench improvement
- [x] Support Monitoring for AMD GPUs (#580)

[x] Support baseline generation from multiple nodes (#575)