Closed mohe22 closed 3 weeks ago
i also given same issue please give me solution and one more question novel latest version 0.5.0 why not working which node version support for latest version?
I ended up doing:
countTokens Function: Counts the number of tokens in a string by splitting it into words. Adjust for more accurate token counting as needed.
truncateText Function: Truncates the text to fit within a specified number of tokens, keeping only the initial portion of the text.
sendREQ Function: Handles sending a user question to a chat model. It checks if the combined length of input and expected response exceeds the model's token limit. If so, it truncates the input. Then, it streams the response from the model.
`
const countTokens = (text: string): number => { // This is a placeholder; use an appropriate tokenizer for your model return text.split(/\s+/).length; // Count tokens more accurately using whitespace };
const truncateText = (text: string, maxTokens: number): string => { const words = text.split(/\s+/); return words.slice(0, maxTokens).join(' '); };
async function* sendREQ(userQuestion: string,DefualtSystemSetting:boolean) { const maxAllowedTokens = 8192; // The model's token limit const maxNewTokens = 500; // The number of tokens we want the model to generate
let inputTokens = countTokens(userQuestion);
if (inputTokens + maxNewTokens > maxAllowedTokens) { const maxInputTokens = maxAllowedTokens - maxNewTokens; userQuestion = truncateText(userQuestion, maxInputTokens); inputTokens = countTokens(userQuestion); }
for await (const chunk of inference.chatCompletionStream({ model: "meta-llama/Meta-Llama-3-8B-Instruct", messages: [
{ role: "user", content:DefualtSystemSetting},
],
max_tokens: maxNewTokens,
})) { const response = chunk.choices[0]?.delta?.content; if (response) { yield response; } } } `
Provide environment information
System: OS: Linux 6.6 Kali GNU/Linux Rolling 2024.2 CPU: (12) x64 11th Gen Intel(R) Core(TM) i5-11400H @ 2.70GHz Memory: 7.10 GB / 15.41 GB Container: Yes Shell: 5.9 - /usr/bin/zsh Binaries: Node: 18.20.1 - /usr/bin/node npm: 10.8.2 - /usr/local/bin/npm
Describe the bug
when i use the /ai for continue writing by AI after getting the response it will show it after 1 second it will delete the text, but if i used ++ there is no issue.
/api/generate
`import { HfInference } from "@huggingface/inference";
const inference = new HfInference(process.env.AI);
export const POST = async (request: Request): Promise => {
const { prompt } = await request.json();
const maxAllowedTokens = 8192; const maxNewTokens = 500;
const autoComplete =
You are an AI writing assistant that continues existing text based on context from prior text. Give more weight/priority to the later characters than the beginning ones. Limit your response to no more than 250 characters, but make sure to construct complete sentences. user:${prompt}
;const stream = new ReadableStream({ async start(controller) { for await (const chunk of inference.chatCompletionStream({ model: "meta-llama/Meta-Llama-3-8B-Instruct", messages: [ { role: "user", content: autoComplete, }, ], max_tokens: maxNewTokens, })) { const response = chunk.choices[0]?.delta?.content; if (response) { controller.enqueue(new TextEncoder().encode(response)); } } controller.close(); }, });
return new Response(stream, { headers: { "Content-Type": "text/plain; charset=utf-8" }, }); }; `
video:
https://github.com/user-attachments/assets/b8c40083-93e3-4ba4-82d4-dfb4365578b2
Link to reproduction
https://github.com
To reproduce
.
Additional information
https://github.com/user-attachments/assets/b8c40083-93e3-4ba4-82d4-dfb4365578b2