twinnydotdev / twinny

The most no-nonsense, locally or API-hosted AI code completion plugin for Visual Studio Code - like GitHub Copilot but completely free and 100% private.
https://twinny.dev
MIT License
2.93k stars 154 forks source link

Oobabooga vs. Twinny #180

Closed zaqhack closed 6 months ago

zaqhack commented 6 months ago

Observation 1: Before I get too far down this rabbit hole, it would be nice if there were an option somewhere to "restore default settings" back to the plugin ...

Observation 2: "/v1/chat/completion" seems to make Oobabooga go insane, today. It works fine with "/v1/completion" endpoint, but the same exact content sent to the first one doesn't work at all. Might want to remove the other endpoint ...

Observation 3: Using Wireshark, I can see that Ooba is sending a response. However, Twinny is giving me image

Response: image

rjmacarthy commented 6 months ago

Hi thanks for the report. I don't personally use oobabooga API so having trouble to test it. Please submit a pull request with a fix for it if you can.

Many thanks,

zaqhack commented 6 months ago

I'm not sure I have time to learn where this lives, but I think I found (one of) the main issues. I was able to get working last night on the first shot of a new chat. After that, it fails (usually 400, bad request). A little more Wireshark later, I was able to find this in one of the payloads:

{
    "role": "assistant",
    "content": "<|im_start|>assistant\nHello! How can I assist you with your coding needs today?\n",
    "type": "",
    "language": {}
},

I think "language" needs to be a string, not an object. (i.e. "" not {} )

The logs say it isn't getting a "string" for that parameter. If I make this change in Postman, it works. Not sure where it would be outputting an object instead of a string, here, but maybe that helps you narrow it down?

zaqhack commented 6 months ago

Side note: It's technically a mimic of the OpenAI API, at this point. If you get it working, it should also work with VLLM, Aphrodite-engine, ChatGPT, and others. You could maybe get away with "OpenAI Compatible" for that dropdown ...

(In truth, I'm getting the log message from Aphrodite because Ooba just implodes when it gets that payload ...)

rjmacarthy commented 6 months ago

Hmm ok, thanks for that. I think we already have a PR to fix a similar issue. https://github.com/rjmacarthy/twinny/pull/159 which has requested changes, if it stays stale for much longer I will take care of it.

Many thanks for the report and detailed response/likely fix.

rjmacarthy commented 6 months ago

@zaqhack I just released version 3.8.9 which should address the issue of the non-compliant fields in the payload for the openai api spec, please could let me know if it helps.

Many thanks

zaqhack commented 6 months ago

Seems to do the trick with Aphrodite-engine ...

Oobabooga is still freaking out. It seems like a default template problem, but I'm honestly not sure. It doesn't give any particularly useful feedback when it pukes. On the plus side, it isn't crashing, now ... but it also isn't giving me much to work with. :-( Here's what I'm seeing, now: https://youtu.be/ZLUoX4YEjqk

I wish I had time to give you more of a hand with this. I have been in love with Twinny from Day One, and it has allowed me to start comparing responses from various coding models. The response time from Aphrodite, as you can see in the video above, is nearly instantaneous.

After a few days of troubleshooting, there is one thing I wish it had: A "reset to default settings" button, somewhere. ha ha ha

rjmacarthy commented 6 months ago

Will add it just for you. By the way, the video link seems to go to the wrong link...

zaqhack commented 6 months ago

It indeed was. Super-weird. I wonder how that even got into my cut-and-paste buffer ...

Still not sure what needs to be done, here, but this is at least the right video. 😅

https://youtu.be/G1C9bdKt5oA

rjmacarthy commented 6 months ago

Ok, so I just got this working on my local instance. I've to pushed a new version so that ooba worked with /v1/completions by default now and updated the code to stream the data from the correct property path. Also, in ooba CMD_FLAGS.txt I had to add --api and --listen flags for the API to work. After these things we're done it started to work. FYI ooba seems to be streaming junk completions to me but I'm not sure if it's the model/template im using. I don't use obba at all really so I am not sure, Ollama is just way better in my opinion.

zaqhack commented 6 months ago

Thanks!

If I can use a model with Aphrodite, I don't look elsewhere. Unfortunately, the acceleration it uses limits my choices a bit by not spanning video cards to allow greater VRAM total (it uses it for acceleration, not more model space). For Twinny, that's a mixed bag. It works well for Deepseek 7b, but I can't fit the bigger models onto one card with it ... for that, I need Kobold or Ooba or whatever. Is what it is. :-)

I should check out Ollama, I guess.

rjmacarthy commented 6 months ago

It should work now, closing.