Open danielholanda opened 1 year ago
There is a mismatch between Groq's latency and throughput. The latency unit is ms, but the actual value printed seems to be in seconds.
benchit models/transformers/bert.py --device groq Models discovered during profiling: bert.py: model (executed 1x) Model Type: Pytorch (torch.nn.Module) Class: BertModel (<class 'transformers.models.bert.modeling_bert.BertModel'>) Location: /home/czhang/Documents/mlagility/models/transformers/bert.py, line 18 Parameters: 109,482,240 (208.8 MB) Hash: d59172a2 Status: Successfully benchmarked on GroqChip1 (groq v2.5.2) Mean Latency: 0.002 milliseconds (ms) Throughput: 459.3 inferences per second (IPS)
Note that, given the throughput, the latency should be ~2.1ms.
@jeremyfowers @ramkrishna2910
Related to https://github.com/groq/mlagility/issues/153
Description
There is a mismatch between Groq's latency and throughput. The latency unit is ms, but the actual value printed seems to be in seconds.
Reproducing
Note that, given the throughput, the latency should be ~2.1ms.