Specifying batch size on llama2-70B cm automation

mlcommons / cm4mlops

A collection of portable, reusable and cross-platform automation recipes (CM scripts) with a human-friendly interface and minimal dependencies to make it easier to build, run, benchmark and optimize AI, ML and other applications and systems across diverse and continuously changing models, data sets, software and hardware (cloud/edge)

http://docs.mlcommons.org/cm4mlops/

Apache License 2.0

9 stars 12 forks source link

Specifying batch size on llama2-70B cm automation #190

Open rajesh-s opened 3 weeks ago

rajesh-s commented 3 weeks ago

I could not find information either in the documentation or in the cm scripts on the batch size that is being used to report the results in the MLCommons database.

The default batch size from the implementation seems to be 1. Is the cm automation specifying a value different from this?
What knobs can the user have to view the configuration used on a particular submission to ensure alignment while profiling new systems?
If I used the automation scripts as indicated on the documentation page, on the same hardware used in the submissions, should I see nearly the same performance?

arjunsuresh commented 3 weeks ago

@rajesh-s most of the inference submissions are done using Nvidia implementation. In CM we have tried to match the typical batch sizes as in the Nvidia submissions but we haven't tested all of the systems. In the CM run command you can specify --batch_size= to use custom batch size for Nvidia implementation.

For reference implementation I'm not sure if different batch sizes work as many things are hardwired and no one has done any submission using it.

rajesh-s commented 2 weeks ago

It would help if the batch sizes are listed atleast on the submissions which I could not find on the results.

The CM run command seems to default to the batch size of 1 as I indicated above, which might be good to note in the documentation. The results vary largely on the sizes and it maybe imperative to document them.

anandhu-eng commented 1 week ago

Hi @rajesh-s, sorry for replying late. Have noted the required addition.

@arjunsuresh , would it be apt to include in collapsible section or should we give that as a tip since there is a chance of users ignoring the collapsible option.

anandhu-eng commented 1 week ago

Hi @rajesh-s , we have added the changes in our forks but its yet to be merged to MLCommons official inference repo. You can find the changes here.