mlcommons / cm4mlops

A collection of portable, reusable and cross-platform automation recipes (CM scripts) to make it easier to build and benchmark AI systems across diverse models, data sets, software and hardware
http://docs.mlcommons.org/cm4mlops/
Apache License 2.0
15 stars 24 forks source link

Test SDXL MLPerf inference on AMD GPU with ROCm for SCC'24 #300

Open gfursin opened 2 months ago

gfursin commented 2 months ago

https://docs.mlcommons.org/inference/benchmarks/text_to_image/reproducibility/scc24

gfursin commented 2 months ago

Need to provide a working configuration.

gfursin commented 2 months ago

Hi @arjunsuresh . Which AMD GPU and ROCm version did you use to test this workflow? I would like to give it a try ... Thanks a lot!

arjunsuresh commented 2 months ago

Hi @gfursin I'm not sure of the exact GPU name as it was tested by the AMD team. But any AMD GPU working with ROCm should be enough. We used ROCm 6.2 - the driver needs to be installed manually. Rest of the dependencies should be picked up by CM.

We also have the SCC24 github action and here we can also add "rocm" if we have a machine for it. https://github.com/mlcommons/cm4mlops/blob/main/.github/workflows/test-scc24-sdxl.yaml#L17

gfursin commented 1 month ago

I just tried to run the benchmark on AMD MI300X with ROCm 6.2 and PyTorch 2.6 - it resolved all dependencies but failed in loadgen. Please see https://github.com/mlcommons/cm4mlperf-inference/issues/48 .