khoj-ai / khoj

Your AI second brain. Get answers to your questions, whether they be online or in your own notes. Use online AI models (e.g gpt4) or private, local LLMs (e.g llama3). Self-host locally or use our cloud instance. Access from Obsidian, Emacs, Desktop app, Web or Whatsapp.
https://khoj.dev
GNU Affero General Public License v3.0
12.63k stars 640 forks source link

[FIX] Khoj outputs chinese characters as raw unicode in a codeblock and many words are misspell #793

Closed ihorizons2022 closed 3 months ago

ihorizons2022 commented 3 months ago

Describe the bug

A clear and concise description of what the bug is. Please include what you were expecting to happen vs. what actually happened.

image

To Reproduce

Steps to reproduce the behavior:

Screenshots

If applicable, add screenshots to help explain your problem.

Platform

If self-hosted

Additional context

Add any other context about the problem here.

debanjum commented 3 months ago

Oh that's a strange output. Did you by any chance request Khoj to output json in a codeblock or something previously? Is it doing this consistently for you? Are you using Khoj cloud?

I initially though the issue was with Khoj rendering chinese characters as raw unicode strings in a code block but Khoj is able to render chinese characters in a multi-line markdown/json codeblock just fine. So not quite sure of the specific issue apart from some dependency with previous messages you've sent it (or any documents of yours it may have indexed)

ihorizons2022 commented 3 months ago

yes, I use khoj cloud. the issue does happen sometimes. I do not know whether the model switch in background may cause the problem.

debanjum commented 3 months ago

Did you by any chance request Khoj to output json in a codeblock or something in your conversation with Khoj previously? I wonder if that's why it chose to output in a codeblock. As the chat conversation screenshot you shared doesn't indicate anything about outputting json codeblocks (as far as I can tell), so it's a little strange that it outputs in that format

ihorizons2022 commented 3 months ago

I found the reason: after several rounds chat, khoj switched the model from gpt-4 to 3.5-turbo, after all, 3.5-turbo is not smart enough and output unicode instead. But the model is still displaying gpt-4 in setting page, it is misleading. could you please change the model displaying, after the free quota is out and display the actual model used.

debanjum commented 3 months ago

Can you clarify what you mean? Why do you think your conversation with Khoj is using gpt-3.5? Khoj shouldn't normally switch the model on you, unless you do it yourself

ihorizons2022 commented 3 months ago

Because I use a new chat session, and choose gpt-3.5 and use the same prompt, the output will be unicode style exactly same with I posted above.

debanjum commented 3 months ago

I see, so when using gpt-3.5 the outputs aren't great for chinese (+json)? That makes sense. Just stick to gpt-4 for that use-case I guess.

Do you see any actionable item on the Khoj codebase for this? Otherwise I'll close this issue

ihorizons2022 commented 3 months ago

But the problem is I stick to gpt-4-turbo-preview, the result is unicode style after several rounds chat. So I want to know whether khoj would change model automatically when quantity or rate limiting is reached.

debanjum commented 3 months ago

No, Khoj shouldn't change the model automatically when rate limits are hit. It should just tell you to upgrade or try again tomorrow.

Not sure why you're seeing unicode style outputs in codeblocks after several rounds of chat. If you can share the text of a complete conversation with this behavior. I can try reproduce (and fix) the behavior on my end.