Closed nayakajay closed 2 years ago
Hi @nayakajay, thanks for the question!
The key for the outer dictionary is the model (along with the batch size) and number of GPUs. The key for the inner dictionary is the second model in case of co-location ("null" means the throughput of the ResNet-18 model in isolation with a batch size of 16 is about 4.79 iterations/second). The co-located models also have a model name and the number of GPUs (Gavel assumes that models can only be co-located with models using the same number of GPUs).
You can collect these files by just benchmarking a couple 100 iterations of each desired model, and measuring the average time per iteration (from which you can compute the throughput in iterations/second).
Let me know if you have any other questions!
Thanks @deepakn94, for the response.
To confirm, in the example, 2.54
is the throughput of ResNet-18 (batch size 32)', 1
when co-located with ResNet-18 (batch size 16)', 1
, and 3.12
is the throughput of ResNet-18 (batch size 16)', 1
when co-located with ResNet-18 (batch size 32)', 1
? Or is it the other way around?
The other way around: 2.54 is the throughput of (ResNet-18 (batch size 16), 1)
and 3.12 is the throughput of (ResNet-18 (batch size 32), 1)
.
Thanks @deepakn94. Closing this issue now.
I wanted to understand a bit about the structure of
xxx-throughputs.json
files present in the repository. For example, insimulation_throughputs.json
'ResNet-18 (batch size 32)', 1
?null
key represent? It would be great if you could also provide details on how you collected/generated these files so that it can be reproduced for a GPU not present in the repository (say, Turing).