I could give it a try to implement it based on ngram speculation
Before submitting a new issue...
[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
🚀 The feature, motivation and pitch
https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outputs
Reminds me on: https://github.com/FasterDecoding/REST https://arxiv.org/html/2311.08252v2
Alternatives
No response
Additional context
I could give it a try to implement it based on ngram speculation
Before submitting a new issue...