Closed krishung5 closed 2 months ago
Can you add a summary of the changes / motivation to the description? Was there an observed X % improvement by doing this? Is it only applicable to certain scenarios? Was struct.pack slower than json.dumps? etc
I believe the performance numbers are shared as a part of DLIS-6498
@rmccorm4 Updated the description with the summary. For more detailed numbers, please see the ticket. Thanks!
This PR enhances performance by replacing the use of
struct.pack
with list operations. For more detailed numbers, please refer to the comments in Jira ticket DLIS-6498.The improvement was observed in various scenarios:
identity model(one input, one output)
Sending small tensor Observed improvement of approximately 6.11% in throughput
Sending larger tensor(1MB) Observed improvement of approximately 2.26% in throughput
Considering that the number of inputs can impact performance, as it requires more work for both struct.pack and list operations.
python model(three inputs, three outputs)
In addition to the experiments mentioned above, simply running the Python code without Triton also indicates that the new code achieves better throughput. As shown below:
Observed improvement of approximately 45.91% in execution time with three inputs.
test.py