yale-sys / prompt-cache

Modular and structured prompt caching for low-latency LLM inference
MIT License
14 stars 1 forks source link

Fix the streaming output #3

Closed sarda-nikhil closed 8 months ago

sarda-nikhil commented 9 months ago

Currently, the streaming output skips the last few tokens, leaving the output looking truncated.

ingim commented 9 months ago

I'll take a look! That might be because of the max_new_tokens.