mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks
https://mlcommons.org/en/groups/inference
Apache License 2.0
1.21k stars 527 forks source link

Performance improvement- GPT-J and BERT Offline scenario #1722

Open anandhu-eng opened 4 months ago

anandhu-eng commented 4 months ago

The current implementation of GPT-J and BERT carries out the prediction in sequential manner. Could the performance of GPT-J and BERT be improved by implementing parallel processing through threads rather than sequential processing?

GPT-J ref: https://github.com/mlcommons/inference/blob/fa4fe53e53379dee27a216695a2b710d122154c7/language/gpt-j/backend.py#L72

BERT ref: https://github.com/mlcommons/inference/blob/fa4fe53e53379dee27a216695a2b710d122154c7/language/bert/pytorch_SUT.py#L68

mrmhodak commented 4 months ago

@anandhu-eng : Please open PR