mlc-ai / web-llm

High-performance In-browser LLM Inference Engine
https://webllm.mlc.ai
Apache License 2.0
13.55k stars 872 forks source link

Usage Stats in Intermediate Steps #559

Open jdp8 opened 1 month ago

jdp8 commented 1 month ago

Hello, I saw that recently the runtimeStatsText() function might be deprecated and that now the usage metadata can be accessed with the streamOptions: { include_usage: True} in the stream request. However, I read that this can only be accessed in the last chunk, instead of at any time such as with runtimeStatsText().

I was wondering if it is possible to get this metadata in the intermediate steps when streaming. In other words, to get the usage metadata when the output chunks are being streamed.

Any assistance with this will be greatly appreciated. Thank you!

tqchen commented 1 month ago

unfortunately dong so would mean the output won't align with the openai proctol, so likely we cannot support such a case, note that async streaming(between worker and the client) is still necessary for best performance

jdp8 commented 1 month ago

I see, thank you. I saw that LangChain (Python) has support for this specific feature but only for OpenAI for now as mentioned here, referencing usage metadata in the intermediate steps. At least that's what I understood.

Just out of curiosity, will this support be added to WebLLM or is it something that has been discussed?

CharlieFRuan commented 1 month ago

Thanks for the inquiry! IIUC, you are inquiring about accessing stats in the middle of a streaming generation of the model.

I do not exactly understand how the Langchain example in the link uses stats in the middle of a streaming generation. I think the "intermediate" is in terms of the event in Langchain's terminology, instead of in the middle of a generation?

Besides, WebLLM is integrated with Langchain.js, perhaps it is worth trying to in-place substitute OpenAI endpoint with WebLLM, and see if the behavior is the same API-wise: https://js.langchain.com/v0.2/docs/integrations/chat/web_llm/