Closed staghado closed 2 months ago
optimum-benchmark
is in constant change, you can find the configs that were used in https://github.com/huggingface/optimum-benchmark/tree/0.0.1/examples/training-llamas
same thing for inference, there are many good examples, but maintaining them with the speed of development of everything in the ecosystem is time consuming, so we removed them for the time being.
peft
means.thanks for the prompt response 😄 I totally understand the need for quick development. did you try any large scale training on AMD? i don't know if that's the goal of optimum but still would be cool to know. I am asking because I am looking for a suitable codebase to benchmark some training on AMD(not LoRA).
@staghado sorry for the late response, I haven't been working on optimum-benchmark lately, you can check the new work in https://huggingface.co/blog/huggingface-amd-mi300 the goal of optimum-benchmark is to allow you to easily get metrics like training throughput, memory consumption, whether the training is possible, etc, quickly and without needing to set up the data+training pipeline. you can also compare diff config and find the one that your machine can handle or that that matches the topology of your machines most (like which tp/dp degree to use).
The training benchmark link no longer works : https://huggingface.co/blog/huggingface-and-optimum-amd
How can one test training throughput on AMD these days? Also, can you provide details about the experiments in the figure below: what ctx length, is this a lora?, how can you have a ddp=2 with 1xMI250, ...