quic / efficient-transformers

This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.
https://quic.github.io/efficient-transformers/
Other
39 stars 26 forks source link

Fix issue with no of prompt less than FBS #130

Closed quic-rishinr closed 1 week ago

quic-rishinr commented 1 week ago

Resolve the issue with Continuous batch model when the number of prompts is less than or equal to the full batch size