How to debug the api responses for local model usage?

nvms / wingman

Your pair programming wingman. Supports OpenAI, Anthropic, or any LLM on your local inference server.

https://marketplace.visualstudio.com/items?itemName=nvms.ai-wingman

ISC License

64 stars 10 forks source link

How to debug the api responses for local model usage? #12

Open synw opened 1 year ago

synw commented 1 year ago

I am trying to use this extension with my custom minimalist inference server for local models (this).

I managed to make the server to process the requests correctly and run inference from the extension api calls. It responds in the OpenAi api format but the extension does not seem to understand or get the response, and is still waiting after the request is completed.

Is there a way to get the errors or debug the responses received in the extension, so that I can see what is the problem?

nvms commented 1 year ago

Yeah, error handling could be improved. Right now, errors should be shown in a notification box if they are caught.

For OpenAI responses, the response is expected be received as a stream:

https://github.com/transitive-bullshit/chatgpt-api/blob/main/src/chatgpt-api.ts#L207 (this is the lib I'm using to handle request/response formatting and send/receive)

I think it'd be pretty simple to add a configuration option for this, i.e., whether or not to expect a stream of responses or one single response.

When a stream event never arrives, I imagine the timeout will eventually trigger and that will be the only error message you see.

I took a look at goinfer and infergui - very cool. I'll definitely be checking these out.

synw commented 1 year ago

Ok I see. I should implement sse in my OpenAi compatibility layer server side. I have it already for my normal api, but not in the OpenAi one, that I implemented recently only to use your extension with my server.

Thanks for pointing the lib, this way I can test the sse and make a compatible implementation

synw commented 1 year ago

To be able to debug more efficiently it would be nice to be able to print the requests and responses, and see the js errors directly, maybe in some kind of debug mode if possible.

@nvms for info I added an OpenAi api compatibility layer in Goinfer, including the streaming response. So the Goinfer server version 0.2.0 is now fully compatible with Wingman

nvms commented 1 year ago

My current idea for this: once I remove chatgpt-api in favor of my own solution, it will be more simple to store the relevant parts of the request & response. Then, in the chat conversation window I can provide a way of inspecting these, e.g. the Inspect request details element below: