Closed jaywonchung closed 1 year ago
@jaywonchung you can check the models
folder for details - see the scaling_config
part for information on how many workers, GPUs per worker and what GPU types were used for each model. You can also find the max batch size in the model files as well. The performance is being updated real time with each query made by the users. You can deploy Aviary yourself to reproduce the results - the configuration in this repository is exactly what we are using for the website.
For the llama-based models (which we are not sharing due to the need for delta weights), we used 2 A10s, deepspeed tensor parallelism and batch size of 6.
That's really nice. Thank you for your detailed answer!
Thanks for putting the leaderboard up. I was just curious about the performance numbers there. Could you comment on how the performance numbers in the leaderboard in https://aviary.anyscale.com/ were generated?
For instance, what GPU was used? For larger ones, was distributed inference used? Can we run the same benchmark using the code in this repository? Were they run with batch size 1?
Thanks a lot.