We are adding log prints for compile time in our evaluation/prediction loop for all inference test cases. To get the compile time, use the --throughput_warmup_steps flag (same logic as in text_generation/run_generation.py). This will also increase the throughput numbers since it will remove the warmup time from the throughput calculation.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
What does this PR do?
We are adding log prints for compile time in our evaluation/prediction loop for all inference test cases. To get the compile time, use the --throughput_warmup_steps flag (same logic as in text_generation/run_generation.py). This will also increase the throughput numbers since it will remove the warmup time from the throughput calculation.