Moved the prompt making call to randomly_sample_sonnet_lines_prompt outside of load request send loop so that the send loop can generate load to the server faster. Otherwise there's an artificial delay due to making the next prompt which slows down the benchmark throughput/sec.
Also changed tokenizer instantiation to just once outside the prompt generation loop to speed up the overall test.
After this change I've seen up to 2x improvement in server achieved throughput in some small workloads. This change will allow better measurement of true server throughput.
Moved the prompt making call to randomly_sample_sonnet_lines_prompt outside of load request send loop so that the send loop can generate load to the server faster. Otherwise there's an artificial delay due to making the next prompt which slows down the benchmark throughput/sec. Also changed tokenizer instantiation to just once outside the prompt generation loop to speed up the overall test. After this change I've seen up to 2x improvement in server achieved throughput in some small workloads. This change will allow better measurement of true server throughput.