Mismatch between Groq's latency and throughput

danielholanda commented 1 year ago

Description

There is a mismatch between Groq's latency and throughput. The latency unit is ms, but the actual value printed seems to be in seconds.

Reproducing

benchit models/transformers/bert.py --device groq
Models discovered during profiling:

bert.py:
        model (executed 1x)
                Model Type:     Pytorch (torch.nn.Module)
                Class:          BertModel (<class 'transformers.models.bert.modeling_bert.BertModel'>)
                Location:       /home/czhang/Documents/mlagility/models/transformers/bert.py, line 18
                Parameters:     109,482,240 (208.8 MB)
                Hash:           d59172a2
                Status:         Successfully benchmarked on GroqChip1 (groq v2.5.2)
                                Mean Latency:   0.002   milliseconds (ms)
                                Throughput:     459.3   inferences per second (IPS)

Note that, given the throughput, the latency should be ~2.1ms.

danielholanda commented 1 year ago

@jeremyfowers @ramkrishna2910

jeremyfowers commented 1 year ago

groq / mlagility

Mismatch between Groq's latency and throughput #287

Description

Reproducing