Transformers 4.35 only supports speculative decoding for batch size == 1. In order to use speculative decoding for batch size > 1, please make sure to use this branch: https://github.com/huggingface/transformers/pull/26875
To do so, you need to install transformers as follows:
Note: Given the "speculative" nature of assistant decoding (a.k.a speculative decoding), it is not recommended to make use of speculative decoding for batch sizes higher than 4 as this might actually lead to the transcription pipeline being slower compared to just using the teacher model.
Confer with Table 22 of the paper.
Transformers 4.35 only supports speculative decoding for batch size == 1. In order to use speculative decoding for batch size > 1, please make sure to use this branch: https://github.com/huggingface/transformers/pull/26875
To do so, you need to install transformers as follows:
and then you can run:
The PR will be merged to Transformers soon.
Note: Given the "speculative" nature of assistant decoding (a.k.a speculative decoding), it is not recommended to make use of speculative decoding for batch sizes higher than 4 as this might actually lead to the transcription pipeline being slower compared to just using the teacher model. Confer with Table 22 of the paper.