Closed VibhuJawa closed 2 years ago
This PR optimized embedding creation
Benchmarks
131 s
634 s
175 s
The core improvement here is that we now clip the extra zeros at the end of the input to BERT to remove redundant DL model operations.
Todo:
This PR optimized embedding creation
Benchmarks
131 s
now vs634 s
previously on Mainline131 s
now vs Sentence Transformers175 s
(Due to faster Rapids tokenization)The core improvement here is that we now clip the extra zeros at the end of the input to BERT to remove redundant DL model operations.
Todo: