huggingface / optimum-benchmark

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
Apache License 2.0
236 stars 41 forks source link

Training benchmarks reproduction #190

Closed staghado closed 2 months ago

staghado commented 5 months ago

The training benchmark link no longer works : https://huggingface.co/blog/huggingface-and-optimum-amd

How can one test training throughput on AMD these days? Also, can you provide details about the experiments in the figure below: what ctx length, is this a lora?, how can you have a ddp=2 with 1xMI250, ...

Screenshot 2024-04-30 at 16 37 35
IlyasMoutawwakil commented 5 months ago

optimum-benchmark is in constant change, you can find the configs that were used in https://github.com/huggingface/optimum-benchmark/tree/0.0.1/examples/training-llamas same thing for inference, there are many good examples, but maintaining them with the speed of development of everything in the ecosystem is time consuming, so we removed them for the time being.

staghado commented 5 months ago

thanks for the prompt response 😄 I totally understand the need for quick development. did you try any large scale training on AMD? i don't know if that's the goal of optimum but still would be cool to know. I am asking because I am looking for a suitable codebase to benchmark some training on AMD(not LoRA).

IlyasMoutawwakil commented 4 months ago

@staghado sorry for the late response, I haven't been working on optimum-benchmark lately, you can check the new work in https://huggingface.co/blog/huggingface-amd-mi300 the goal of optimum-benchmark is to allow you to easily get metrics like training throughput, memory consumption, whether the training is possible, etc, quickly and without needing to set up the data+training pipeline. you can also compare diff config and find the one that your machine can handle or that that matches the topology of your machines most (like which tp/dp degree to use).