sourcegraph / sourcegraph-public-snapshot

Code AI platform with Code Search & Cody
https://sourcegraph.com
Other
10.1k stars 1.27k forks source link

Cody: Azure OpenAI provider api version hardening #61031

Open chwarwick opened 6 months ago

chwarwick commented 6 months ago

As a result of the Azure OpenAI go package using preview apis that are continually retired, an override on the API version needed to be implemented.

To add additional safety to this implementation parsing tests using a http testserver should be added to ensure that any future updates to the Azure OpenAI package continue to be able to parse the results from the 2003-05-15 api version.

Additional the http client should wrap our httpcli transport, so the requests will be counted accordingly and appear in the outgoing request logs.

RXminuS commented 6 months ago

Couple of ideas:

chwarwick commented 6 months ago

Instead of creating a client we can simply set a per-call APIVersion policy. This is bit less intrusive then creating the whole transport layer ourselves.

The reason I mentioned this is we have a whole bunch of stuff in the internal http stack here that allows admins and us to debug issues with customers via logging in the admin section of the UI where they can see outgoing requests https://sourcegraph.sourcegraph.com/site-admin/outbound-requests.

The other items sound good, the goal is just to make sure it doesn't break unknowingly

RXminuS commented 5 months ago

Ok, this is actually really difficult to test in a robust way where we use their API spec as the source of truth.

My idea was to take valid examples by their spec, marshal those into the structs of the installed package, unmarshal them and then check against the same spec. This way I was hoping to detect differences between API spec and package structs.

However it seems that the generated structs don't actually deserialize all the fields that are specified in the spec, primarily model and object. For the object especially I can see that they started moving from a string (in completions) to a stricter enum (in chat completions), so it's potentially something that could break in a future package release.

Also because every field in the generated structs is a pointer and there is no validation the only exception you'd catch is if the type of an existing field would change to something incompatible. However I fear that most breaking changes would come from added / removed fields that were/weren't required.

I'm not really sure what a useful next step is here. Any thoughts @chwarwick ?

chwarwick commented 5 months ago

Ok, this is actually really difficult to test in a robust way where we use their API spec as the source of truth.

My idea was to take valid examples by their spec, marshal those into the structs of the installed package, unmarshal them and then check against the same spec. This way I was hoping to detect differences between API spec and package structs.

I was thinking like you are describing too, like we mock out a valid request/response and setup a go testserver that returns it just to make sure it can unmarshal into the structs from the go package. From what I've seen the completions api's have been pretty stable, I think all the preview versions have been mostly about adding new services

RXminuS commented 5 months ago

But the problem is that basically anything will unmarshal correctly so then I'm not sure we're testing compatibility. Which is what they call out in their docs. You'll just end up with fields unset in the resulting struct.

chwarwick commented 5 months ago

But the problem is that basically anything will unmarshal correctly so then I'm not sure we're testing compatibility. Which is what they call out in their docs. You'll just end up with fields unset in the resulting struct.

I think you can check in the test server hander to verify we get the correct messages (author/text) and api version in the url and then from the return of our api handler has the full completion text those are the parts we care about the most. Honestly we shouldn't need to update the package as often now, they we forced it to a stable version, and I would hope that if someone did in the future they would at least try some chat/autocomplete to ensure it works. The problem should be easy to spot, but the idea here is to avoid someone either unknowingly updating the package or doing so for some other reason and missing that they then broke this (probably unlikely).