huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

[Feature request] Performance and accuracy benchmarks #61

Open brianyu-nexusflowai opened 7 months ago

brianyu-nexusflowai commented 7 months ago

Hi Huggingface Nanotron team!

Can I request some tooling surrounding nanotron regarding how fast it is compared to other LM training frameworks e.g. FSDP, Deepspeed, and Megatron-LM? It would be great to have performance metrics under different training workloads e.g. llama 2 7/13/34/70B x seq len 2048/4096/8192 x global batch size 128/4096. The metrics I'm interested in include seconds/step, peak GPU mem usage, and communication time.

Additionally, can I request some end-to-end tests involving finetuning an LM on a dataset and evaluating the downstream performance on a difficult task? An example is finetuning Llama 2 7B on Open-Platypus dataset and evaluating it on the OpenLLM leaderboard benchmarks. Ideally these e2e tests would also be a script that could be sanity run on any new nanotron docker setup to reproduce the performance.

I know this is a lot to ask, especially when I'm not in a personal position to contribute. Thank you so much!

Cheers, Brian

NouamaneTazi commented 7 months ago

Hello @brianyu-nexusflowai! Thanks for your interest. We can try to run some of these benchs for you. How would you measure communication time?

brianyu-nexusflowai commented 7 months ago

Hi Nouamane!

Thanks for the response. Maybe something similar to the Deepspeed's metric regarding their all_gather/all_reduce time taken e.g. I'm not sure what the communication primitive used in nanotron is, but it would be great to have some measure of how long these operations take!

Cheers, Brian