Open seonglae opened 10 months ago
Hi there 👋 I definitely think the addition of an equivalent TextStreamer
class to the library will be great! If someone in the community would like to contribute this, it should be as simple as rewriting this file in JavaScript.
The current approach to text streaming (which was actually added before the python library added TextStreamer
) is to add a callback_function
to the generate/pipeline function. For example:
const pipe = await pipeline(
'text-generation',
model,
{ quantized: true }
)
pipe(prompt, { callback_function: beams => { console.log(beams) }})
Here's an example of streaming + decoding:
@xenova How to define the callback_function to make the text-generation stop at special words(like the openai api's "stop" param) I also find the code from transformer.js you shows, but I am confused about how to do this next
@xenova is this issue still open for contribution?
Streamer https://huggingface.co/docs/transformers/generation_strategies#streaming
Reason for request Currently, iterating
max_new_tokens: 1
takes much longer time than single generation. Text generation takes time even for light model. Token streaming is key feature for user experience. In my case. Task-specific text generation could be a key feature of AI app development using transformers.js with low cost.Additional context I'm not sure the
TextStreamer
class need to be compatibility with python transformers. I wrote an use case proposal withTextStreamer extends TransformStream
. AsyncIterable, AsynsGeneator and Stream API might be usable.Suggesting streaming code
This is vercel's approach https://github.com/vercel/ai/blob/main/packages/core/streams/ai-stream.ts https://github.com/vercel-labs/ai-chatbot/blob/main/app/api/chat/route.ts