bug: continue writing issue

mohe22 commented 1 month ago

Provide environment information

System: OS: Linux 6.6 Kali GNU/Linux Rolling 2024.2 CPU: (12) x64 11th Gen Intel(R) Core(TM) i5-11400H @ 2.70GHz Memory: 7.10 GB / 15.41 GB Container: Yes Shell: 5.9 - /usr/bin/zsh Binaries: Node: 18.20.1 - /usr/bin/node npm: 10.8.2 - /usr/local/bin/npm

Describe the bug

when i use the /ai for continue writing by AI after getting the response it will show it after 1 second it will delete the text, but if i used ++ there is no issue.

/api/generate

`import { HfInference } from "@huggingface/inference";

const inference = new HfInference(process.env.AI);

export const POST = async (request: Request): Promise => { const { prompt } = await request.json();

const maxAllowedTokens = 8192; const maxNewTokens = 500;

const autoComplete = You are an AI writing assistant that continues existing text based on context from prior text. Give more weight/priority to the later characters than the beginning ones. Limit your response to no more than 250 characters, but make sure to construct complete sentences. user:${prompt} ;

return new Response(stream, { headers: { "Content-Type": "text/plain; charset=utf-8" }, }); }; `

video:

https://github.com/user-attachments/assets/b8c40083-93e3-4ba4-82d4-dfb4365578b2

Link to reproduction

https://github.com

To reproduce

.

Additional information

https://github.com/user-attachments/assets/b8c40083-93e3-4ba4-82d4-dfb4365578b2

bansi-hkapps commented 1 month ago

i also given same issue please give me solution and one more question novel latest version 0.5.0 why not working which node version support for latest version?

mohe22 commented 3 weeks ago

I ended up doing:

countTokens Function: Counts the number of tokens in a string by splitting it into words. Adjust for more accurate token counting as needed.

truncateText Function: Truncates the text to fit within a specified number of tokens, keeping only the initial portion of the text.

sendREQ Function: Handles sending a user question to a chat model. It checks if the combined length of input and expected response exceeds the model's token limit. If so, it truncates the input. Then, it streams the response from the model.

`

const countTokens = (text: string): number => { // This is a placeholder; use an appropriate tokenizer for your model return text.split(/\s+/).length; // Count tokens more accurately using whitespace };

const truncateText = (text: string, maxTokens: number): string => { const words = text.split(/\s+/); return words.slice(0, maxTokens).join(' '); };

async function* sendREQ(userQuestion: string,DefualtSystemSetting:boolean) { const maxAllowedTokens = 8192; // The model's token limit const maxNewTokens = 500; // The number of tokens we want the model to generate

let inputTokens = countTokens(userQuestion);

if (inputTokens + maxNewTokens > maxAllowedTokens) { const maxInputTokens = maxAllowedTokens - maxNewTokens; userQuestion = truncateText(userQuestion, maxInputTokens); inputTokens = countTokens(userQuestion); }

for await (const chunk of inference.chatCompletionStream({ model: "meta-llama/Meta-Llama-3-8B-Instruct", messages: [

  { role: "user", content:DefualtSystemSetting},
],
max_tokens: maxNewTokens,

})) { const response = chunk.choices[0]?.delta?.content; if (response) { yield response; } } } `

steven-tey / novel