Open debraj135 opened 9 months ago
The performance of various models relies on different factors, including model size, compute configuration (gpu model and counts), and model, system or algorithm optimizations. An API provider may have a different strategy to optimize different model.
I was wondering how to understand this. I would expect llama2 70b to have a lower throughput.
Is the configuration different between the table for llama2 70b and the table for llama2 7b.