Closed zaqhack closed 6 months ago
Hi thanks for the report. I don't personally use oobabooga API so having trouble to test it. Please submit a pull request with a fix for it if you can.
Many thanks,
I'm not sure I have time to learn where this lives, but I think I found (one of) the main issues. I was able to get working last night on the first shot of a new chat. After that, it fails (usually 400, bad request). A little more Wireshark later, I was able to find this in one of the payloads:
{
"role": "assistant",
"content": "<|im_start|>assistant\nHello! How can I assist you with your coding needs today?\n",
"type": "",
"language": {}
},
I think "language" needs to be a string, not an object. (i.e. "" not {} )
The logs say it isn't getting a "string" for that parameter. If I make this change in Postman, it works. Not sure where it would be outputting an object instead of a string, here, but maybe that helps you narrow it down?
Side note: It's technically a mimic of the OpenAI API, at this point. If you get it working, it should also work with VLLM, Aphrodite-engine, ChatGPT, and others. You could maybe get away with "OpenAI Compatible" for that dropdown ...
(In truth, I'm getting the log message from Aphrodite because Ooba just implodes when it gets that payload ...)
Hmm ok, thanks for that. I think we already have a PR to fix a similar issue. https://github.com/rjmacarthy/twinny/pull/159 which has requested changes, if it stays stale for much longer I will take care of it.
Many thanks for the report and detailed response/likely fix.
@zaqhack I just released version 3.8.9 which should address the issue of the non-compliant fields in the payload for the openai api spec, please could let me know if it helps.
Many thanks
Seems to do the trick with Aphrodite-engine ...
Oobabooga is still freaking out. It seems like a default template problem, but I'm honestly not sure. It doesn't give any particularly useful feedback when it pukes. On the plus side, it isn't crashing, now ... but it also isn't giving me much to work with. :-( Here's what I'm seeing, now: https://youtu.be/ZLUoX4YEjqk
I wish I had time to give you more of a hand with this. I have been in love with Twinny from Day One, and it has allowed me to start comparing responses from various coding models. The response time from Aphrodite, as you can see in the video above, is nearly instantaneous.
After a few days of troubleshooting, there is one thing I wish it had: A "reset to default settings" button, somewhere. ha ha ha
Will add it just for you. By the way, the video link seems to go to the wrong link...
It indeed was. Super-weird. I wonder how that even got into my cut-and-paste buffer ...
Still not sure what needs to be done, here, but this is at least the right video. 😅
Ok, so I just got this working on my local instance. I've to pushed a new version so that ooba worked with /v1/completions by default now and updated the code to stream the data from the correct property path. Also, in ooba CMD_FLAGS.txt
I had to add --api
and --listen
flags for the API to work. After these things we're done it started to work. FYI ooba seems to be streaming junk completions to me but I'm not sure if it's the model/template im using. I don't use obba at all really so I am not sure, Ollama is just way better in my opinion.
Thanks!
If I can use a model with Aphrodite, I don't look elsewhere. Unfortunately, the acceleration it uses limits my choices a bit by not spanning video cards to allow greater VRAM total (it uses it for acceleration, not more model space). For Twinny, that's a mixed bag. It works well for Deepseek 7b, but I can't fit the bigger models onto one card with it ... for that, I need Kobold or Ooba or whatever. Is what it is. :-)
I should check out Ollama, I guess.
It should work now, closing.
Observation 1: Before I get too far down this rabbit hole, it would be nice if there were an option somewhere to "restore default settings" back to the plugin ...
Observation 2: "/v1/chat/completion" seems to make Oobabooga go insane, today. It works fine with "/v1/completion" endpoint, but the same exact content sent to the first one doesn't work at all. Might want to remove the other endpoint ...
Observation 3: Using Wireshark, I can see that Ooba is sending a response. However, Twinny is giving me
Response: