Twinny continues responding with "Sorry, I don't understand. Please try again" if it encounters even a single prompt it does not understand

nav9 commented 9 months ago

Describe the bug Once Twinny encounters something it does not understand, it responds with "Sorry, I don't understand. Please try again", for all subsequent prompts. Also, the "fill in the middle" code completions don't work at all.

To Reproduce Steps to reproduce the behavior:

Highlight some code in the code editor and ask Twinny to explain the code, or type any prompt which makes Twinny respond with "Sorry, I don't understand. Please try again".
Type any other prompt.

On starting a new chat, Twinny responds normally until it encounters something for which it responds with "Sorry, I don't understand. Please try again", and then all subsequent responses are "Sorry, I don't understand. Please try again".

Expected behavior Twinny should ignore any ambiguous prompt and do what the User wanted it to do.

Screenshots prime1 dontUnderstand

Desktop (please complete the following information):

OS: Linux Mint 21 cinnamon vscode version 1.85.1 Twinny version: 3.5.35

Additional context Models used with Ollama:

Chat: mistral:latest
Deepseek-coder:...7b-instruct-q8_0

rjmacarthy commented 9 months ago

Hey, thanks for the report.

Please try to change the FIM model to a base model e.g deepseek-coder:6.7b-base-q5_K_M and for your chat model you probably should try to update the prompt template to not include llama tokens.

Also enabling the debug option and opening dev tools might provide some insights.

Please let me know if it helps.

Thanks

nav9 commented 8 months ago

Thanks. I tried again, but ran into some issues:

Knowledge: Even though I have been through DeepLearning's course on generative AI, I don't know what you mean by "try to update the prompt template to not include llama tokens". I know what tokens are. I can see a pencil icon for "Edit twinny templates". But I don't know which prompt template to update, and how to figure out which llama tokens are used where. Please remember this about other users too. Even in the readme, the simplest way to help someone get started, would be to mention which two models they can use, instead of explaining the technicalities, which they wouldn't understand.
Generation stopping: I switched to deepseek-coder:6.7b-base-q5_K_M, and as you can see from the screenshot below, generation stops prematurely. This happened twice. It sometimes ends up with the previous issue of "Sorry I don't understand, please try again".
Alt \: At the for line (line 4), on pressing Alt \ multiple times, I noticed for once, an autocomplete suggestion i in range(2,n-1):, but I couldn't figure out how to make it fill in the suggested code. I tried Enter, Alt Enter, Shift Enter and Ctrl Enter, but I eventually had to manually type the code that was suggested. Often, when I press Alt \, I see the CPU usage going to 100% on all cores, but no code suggestion is offered. Often I have to press Alt \ multiple times for any code suggestion to show up. My processor is a AMD Ryzen 5 5600G with inbuilt Radeon Graphics × 6 and there's 32GB RAM. I don't have any separate GPU card. I'm working on Linux Mint 21.
Debug: You mentioned looking at the debug logs. I don't see any output on vscode's debug console. Did you mean something else?

stoppedGeneration

AntonKrug commented 8 months ago

1) You mean tweaking the modelfile? https://github.com/ollama/ollama/blob/main/docs/modelfile.md

nav9 commented 8 months ago

You mean tweaking the modelfile? https://github.com/ollama/ollama/blob/main/docs/modelfile.md

No. I didn't even know a model file like that existed. One thing I hope y'all understand, is that the average User is not aware of such technicalities. They just wish that they could install Ollama and Twinny, and eveything would "just work".

rjmacarthy commented 8 months ago

Hey there!, Thanks so much for bringing this up and for all of the information.

At the moment, we're not quite there yet when it comes to supporting the level of usability and user-friendliness you're looking for with local language models. Given the variety in setups, models, API, and hardware out there, it's a bit of a challenge to ensure everything works smoothly for everyone.

I've been using codellama-code and codellama-instruct myself, and they've been working out pretty well. Just a heads-up, though, this software is provided free of charge and "as is." So, it might not meet all your expectations perfectly, but we're constantly working to improve it.

We really appreciate your understanding and patience! If you have any more feedback or need further assistance, please shout out.

Edit: In version 3.7.0 I updated the ollama options to use the openai specification which may help with chat completion errors.

nav9 commented 8 months ago

@rjmacarthy , I understand. Of all the local copilots I tried, Twinny, Wingman-AI and Continue have IMHO, been well designed and easy to use (Continue is quite buggy, so that leaves Twinny and Wingman-AI). There are at least 14 other extensions which either aren't easy to use or are poorly designed, so you guys have actually done a great job in the design and implementation. The reason I'm taking time to report issues is specifically because I admire this project and intend to use it. I'll try the codellama models (I just noticed there's a model called phind-codellama too, specifically for code generation, and a smaller stable-code model that I assume might work faster on CPU-only systems). Thanks! :-)

nav9 commented 8 months ago

One other thing I noticed on my system:

deepseek coder takes 18 seconds to FIM, once I press Alt \.
codellama takes 36 seconds to FIM for the same code.
stable-code takes just 2 or 3 seconds.

So I believe it'll help mention in the readme, that if a person uses CPU-only, it's better to use one of the smaller models. It'll give a quicker response. The clearest way mention it would be something like:

If you have an NVidia GPU with xyz capability, use abc model for FIM and def model for chat.
If you are using only CPU, use abc-small model for FIM and def-small model for chat.

Humble suggestion: The fingers need to stretch a lot to reach for the Alt and \ buttons on the keyboard. If the key combo for FIM could be changed to some other keys or could be kept configurable, it'd be nice. Ctrl+Spacebar is one possibility.

I just tried stable-code in Wingman-AI, and got an error that the model is not supported. So this makes Twinny as the best local LLM option among VS Code extensions. Congratulations on the excellent design!

Another suggestion
twinnyFIM
When Twinny generates a code suggestion as shown in the image, I didn't know what key to press to convert the suggested code into actual code. After a lot of trial and error, I found it is the Tab key. Please mention this in the readme file.

ps: I couldn't find any model named codellama-instruct on ollama's page, but found out a modelfile needs to be created to use it. I'll try. . Additionally, I'm using Twinny version 3.7.4, and sometimes the chat completion errors still show up. I believe it depends on the prompt I type and perhaps other factors which might specify the number of tokens to be generated or returned. You'd probably be able to replicate the issue if you disconnect your graphics card or use a laptop without a powerful graphics card.

rjmacarthy commented 8 months ago

Hey @nav9 thank you for your detailed response. Please feel free to submit a PR to update the readme if you think things should be included like performance and anything else you think might be useful, your input is very much appreciated. In terms of the keyboard shortcut they are the same as Github Copilot and that is the defaults, they can be changed in user preferences if required. Please feel free to open new issues if they arise so we can close this one. Many thanks.

Cephra commented 6 months ago

I have a similar problem now. No matter what I do or what model I use, I keep getting "Sorry, I don’t understand. Please try again.". It used to work before. Just when I started VSCodium today it started to occur.

Cephra commented 6 months ago

I have a similar problem now. No matter what I do or what model I use, I keep getting "Sorry, I don’t understand. Please try again.". It used to work before. Just when I started VSCodium today it started to occur.

I just wanted to give an update as to why it happened (maybe helpful for others too): I've set the wrong URL for the API. I've had it set to /api/chat instead of /v1/chat/completions Changing it to the latter fixed my issue.

twinnydotdev / twinny

Twinny continues responding with "Sorry, I don't understand. Please try again" if it encounters even a single prompt it does not understand #120