Closed nav9 closed 8 months ago
Hey, thanks for the report.
Please try to change the FIM model to a base model e.g deepseek-coder:6.7b-base-q5_K_M
and for your chat model you probably should try to update the prompt template to not include llama tokens.
Also enabling the debug option and opening dev tools might provide some insights.
Please let me know if it helps.
Thanks
Thanks. I tried again, but ran into some issues:
deepseek-coder:6.7b-base-q5_K_M
, and as you can see from the screenshot below, generation stops prematurely. This happened twice. It sometimes ends up with the previous issue of "Sorry I don't understand, please try again". Alt \
: At the for
line (line 4), on pressing Alt \
multiple times, I noticed for once, an autocomplete suggestion i in range(2,n-1):
, but I couldn't figure out how to make it fill in the suggested code. I tried Enter
, Alt Enter
, Shift Enter
and Ctrl Enter
, but I eventually had to manually type the code that was suggested. Often, when I press Alt \
, I see the CPU usage going to 100% on all cores, but no code suggestion is offered. Often I have to press Alt \
multiple times for any code suggestion to show up. My processor is a AMD Ryzen 5 5600G
with inbuilt Radeon Graphics × 6
and there's 32GB
RAM. I don't have any separate GPU card. I'm working on Linux Mint 21.1) You mean tweaking the modelfile? https://github.com/ollama/ollama/blob/main/docs/modelfile.md
- You mean tweaking the modelfile? https://github.com/ollama/ollama/blob/main/docs/modelfile.md
No. I didn't even know a model file like that existed. One thing I hope y'all understand, is that the average User is not aware of such technicalities. They just wish that they could install Ollama and Twinny, and eveything would "just work".
Hey there!, Thanks so much for bringing this up and for all of the information.
At the moment, we're not quite there yet when it comes to supporting the level of usability and user-friendliness you're looking for with local language models. Given the variety in setups, models, API, and hardware out there, it's a bit of a challenge to ensure everything works smoothly for everyone.
I've been using codellama-code
and codellama-instruct
myself, and they've been working out pretty well. Just a heads-up, though, this software is provided free of charge and "as is." So, it might not meet all your expectations perfectly, but we're constantly working to improve it.
We really appreciate your understanding and patience! If you have any more feedback or need further assistance, please shout out.
Edit: In version 3.7.0
I updated the ollama options to use the openai specification which may help with chat completion errors.
@rjmacarthy , I understand. Of all the local copilots I tried, Twinny, Wingman-AI and Continue have IMHO, been well designed and easy to use (Continue is quite buggy, so that leaves Twinny and Wingman-AI). There are at least 14 other extensions which either aren't easy to use or are poorly designed, so you guys have actually done a great job in the design and implementation. The reason I'm taking time to report issues is specifically because I admire this project and intend to use it. I'll try the codellama
models (I just noticed there's a model called phind-codellama too, specifically for code generation, and a smaller stable-code model that I assume might work faster on CPU-only systems). Thanks! :-)
One other thing I noticed on my system:
deepseek coder
takes 18 seconds to FIM, once I press Alt \
.codellama
takes 36 seconds to FIM for the same code.stable-code
takes just 2 or 3 seconds.So I believe it'll help mention in the readme, that if a person uses CPU-only, it's better to use one of the smaller models. It'll give a quicker response. The clearest way mention it would be something like:
xyz
capability, use abc
model for FIM and def
model for chat. abc-small
model for FIM and def-small
model for chat. Humble suggestion: The fingers need to stretch a lot to reach for the Alt
and \
buttons on the keyboard. If the key combo for FIM could be changed to some other keys or could be kept configurable, it'd be nice. Ctrl+Spacebar
is one possibility.
I just tried stable-code
in Wingman-AI, and got an error that the model is not supported. So this makes Twinny
as the best local LLM option among VS Code extensions. Congratulations on the excellent design!
Another suggestion
When Twinny generates a code suggestion as shown in the image, I didn't know what key to press to convert the suggested code into actual code. After a lot of trial and error, I found it is the Tab
key. Please mention this in the readme file.
ps: I couldn't find any model named codellama-instruct
on ollama's page, but found out a modelfile needs to be created to use it. I'll try. . Additionally, I'm using Twinny version 3.7.4, and sometimes the chat completion errors still show up. I believe it depends on the prompt I type and perhaps other factors which might specify the number of tokens to be generated or returned. You'd probably be able to replicate the issue if you disconnect your graphics card or use a laptop without a powerful graphics card.
Hey @nav9 thank you for your detailed response. Please feel free to submit a PR to update the readme if you think things should be included like performance and anything else you think might be useful, your input is very much appreciated. In terms of the keyboard shortcut they are the same as Github Copilot and that is the defaults, they can be changed in user preferences if required. Please feel free to open new issues if they arise so we can close this one. Many thanks.
I have a similar problem now. No matter what I do or what model I use, I keep getting "Sorry, I don’t understand. Please try again.". It used to work before. Just when I started VSCodium today it started to occur.
I have a similar problem now. No matter what I do or what model I use, I keep getting "Sorry, I don’t understand. Please try again.". It used to work before. Just when I started VSCodium today it started to occur.
I just wanted to give an update as to why it happened (maybe helpful for others too): I've set the wrong URL for the API. I've had it set to /api/chat instead of /v1/chat/completions Changing it to the latter fixed my issue.
Describe the bug Once Twinny encounters something it does not understand, it responds with "Sorry, I don't understand. Please try again", for all subsequent prompts. Also, the "fill in the middle" code completions don't work at all.
To Reproduce Steps to reproduce the behavior:
On starting a new chat, Twinny responds normally until it encounters something for which it responds with "Sorry, I don't understand. Please try again", and then all subsequent responses are "Sorry, I don't understand. Please try again".
Expected behavior Twinny should ignore any ambiguous prompt and do what the User wanted it to do.
Screenshots
Desktop (please complete the following information):
Additional context Models used with Ollama: