neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
3.01k stars 176 forks source link

Add bucketing to DeepSparseSentenceTransformer #1334

Closed mgoin closed 1 year ago

mgoin commented 1 year ago

Tighten up performance for DeepSparseSentenceTransformer by implementing bucketing alongside a dynamic model for small sequence lengths and batching. This also comes with a benchmarking script that shows speedup compared against SentenceTransformers

python benchmark_encoding.py --base_model BAAI/bge-small-en-v1.5 --sparse_model zeroshot/bge-small-en-v1.5-quant

[Standard SentenceTransformer] Encoded 100 sentences of length 700 in 10.42 seconds.
[DeepSparse] Encoded 100 sentences of length 700 in 4.04 seconds.
[DeepSparse Optimized] Encoded 100 sentences of length 700 in 1.82 seconds.