Open anandhu-eng opened 4 months ago
The current implementation of GPT-J and BERT carries out the prediction in sequential manner. Could the performance of GPT-J and BERT be improved by implementing parallel processing through threads rather than sequential processing?
GPT-J ref: https://github.com/mlcommons/inference/blob/fa4fe53e53379dee27a216695a2b710d122154c7/language/gpt-j/backend.py#L72
BERT ref: https://github.com/mlcommons/inference/blob/fa4fe53e53379dee27a216695a2b710d122154c7/language/bert/pytorch_SUT.py#L68
@anandhu-eng : Please open PR
The current implementation of GPT-J and BERT carries out the prediction in sequential manner. Could the performance of GPT-J and BERT be improved by implementing parallel processing through threads rather than sequential processing?
GPT-J ref: https://github.com/mlcommons/inference/blob/fa4fe53e53379dee27a216695a2b710d122154c7/language/gpt-j/backend.py#L72
BERT ref: https://github.com/mlcommons/inference/blob/fa4fe53e53379dee27a216695a2b710d122154c7/language/bert/pytorch_SUT.py#L68