xenova / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
https://huggingface.co/docs/transformers.js
Apache License 2.0
11.16k stars 693 forks source link

[Feature request] streamer callback for text-generation task #394

Open seonglae opened 10 months ago

seonglae commented 10 months ago

Streamer https://huggingface.co/docs/transformers/generation_strategies#streaming

Reason for request Currently, iterating max_new_tokens: 1 takes much longer time than single generation. Text generation takes time even for light model. Token streaming is key feature for user experience. In my case. Task-specific text generation could be a key feature of AI app development using transformers.js with low cost.

Additional context I'm not sure the TextStreamer class need to be compatibility with python transformers. I wrote an use case proposal with TextStreamer extends TransformStream. AsyncIterable, AsynsGeneator and Stream API might be usable.

Suggesting streaming code

let aggregatedResponse = '';
const streamer = TextStreamer()
const pipe: QuestionAnsweringPipeline = await pipeline(
  'text-generation',
  model,
  { quantized: true }
)
pipe(prompt, { streamer: streamer })
res = new Response(streamer.readable)

This is vercel's approach https://github.com/vercel/ai/blob/main/packages/core/streams/ai-stream.ts https://github.com/vercel-labs/ai-chatbot/blob/main/app/api/chat/route.ts

xenova commented 10 months ago

Hi there 👋 I definitely think the addition of an equivalent TextStreamer class to the library will be great! If someone in the community would like to contribute this, it should be as simple as rewriting this file in JavaScript.

The current approach to text streaming (which was actually added before the python library added TextStreamer) is to add a callback_function to the generate/pipeline function. For example:

const pipe = await pipeline(
  'text-generation',
  model,
  { quantized: true }
)
pipe(prompt, { callback_function: beams => { console.log(beams) }})

Here's an example of streaming + decoding:

https://github.com/xenova/transformers.js/blob/4e4148cb5ce7f4a9265f58b4eeb660c64bed0386/examples/demo-site/src/worker.js#L189-L202

wujohns commented 4 months ago

@xenova How to define the callback_function to make the text-generation stop at special words(like the openai api's "stop" param) I also find the code from transformer.js you shows, but I am confused about how to do this next

02shanks commented 1 month ago

@xenova is this issue still open for contribution?