Open jdp8 opened 1 month ago
unfortunately dong so would mean the output won't align with the openai proctol, so likely we cannot support such a case, note that async streaming(between worker and the client) is still necessary for best performance
I see, thank you. I saw that LangChain (Python) has support for this specific feature but only for OpenAI for now as mentioned here, referencing usage metadata in the intermediate steps. At least that's what I understood.
Just out of curiosity, will this support be added to WebLLM or is it something that has been discussed?
Thanks for the inquiry! IIUC, you are inquiring about accessing stats in the middle of a streaming generation of the model.
I do not exactly understand how the Langchain example in the link uses stats in the middle of a streaming generation. I think the "intermediate" is in terms of the event in Langchain's terminology, instead of in the middle of a generation?
Besides, WebLLM is integrated with Langchain.js, perhaps it is worth trying to in-place substitute OpenAI endpoint with WebLLM, and see if the behavior is the same API-wise: https://js.langchain.com/v0.2/docs/integrations/chat/web_llm/
Hello, I saw that recently the
runtimeStatsText()
function might be deprecated and that now the usage metadata can be accessed with thestreamOptions: { include_usage: True}
in the stream request. However, I read that this can only be accessed in the last chunk, instead of at any time such as withruntimeStatsText()
.I was wondering if it is possible to get this metadata in the intermediate steps when streaming. In other words, to get the usage metadata when the output chunks are being streamed.
Any assistance with this will be greatly appreciated. Thank you!