[WIP] add `wrap_anthropic`

enginoid commented 2 weeks ago

This is incomplete at the moment but posting this for feedback as a work in progress. Currently implemented is a rudimentary support through async/sync and streaming/non-streaming, backed up by tests.

TODOs before leaving draft stage

Currently, the added tests are passing. Things that are left to do on this before undrafting:

[ ] Decide whether to put TracingExtra into a shared utility or create a separate one for the anthropic wrapper.
[ ] Better metadata/outputs via reducer logic:
- [ ] The output structures aren't correct right now. Currently streaming has an outputs: {text: 'foo'} while not streaming has outputs: {output: {content: [{type: 'text': text: 'foo'}]}} which is highly inconsistent. This is just because I've been focused on getting the basic concept working without thinking too much about what the output should look like exactly. This will involve writing some reducers, which is probably the bulk of work left in getting the sync/async API working.
- [ ] Tool usage output isn't supported at all yet.
- [ ] In general, it would be good to get some advice on what the outputs should look like (on principle, if not in detail) – ie. should they be exactly the same as the outputs for OpenAI even where the APIs are different?
[ ] Tests:
- [ ] The tests are fairly repetitive, especially in the assertions on metadata and outputs. It would be nice to parameterize them a little bit. I need to do a round on the tests and make sure they're more consistent.
- [ ] Missing test_chat_async_api (async, no streaming), although I believe that to be working. So 3/4 tests are there.

Thoughts/caveats

There's also a nice stream API which is actually a lot nicer for streaming than the overloaded create, but I would like to leave it out of scope if possible, just to get something basic working for Anthropic.
It took a bit of work to get the tests actually passing because I couldn't figure out why I wasn't seeing any requests in the MockSession. After some debugging, it seems that tracing was just off and I think this means that tracing is not actually on for the OpenAI tests either, and the assertions could be giving a false sense of security:
```
for call in mock_session.return_value.request.call_args_list[1:]:  # passes because list is empty
    assert call[0][0].upper() == "POST"
```
I removed the sleep that was in the equivalent tests and switched to a client with auto_batch_tracing=False. I think this should be fine.
There's no seed for Anthropic, as far as I know, so there is a risk these tests will be flakier. (They may also be flakier in case outputs are nondeterministic, since I've added a test on outputs as well, but in my testing it is very consistently returning the same string 'foo' so far.)

hinthornw commented 2 weeks ago

Thanks for the PR and for getting the integration started!

I think the challenging bit that has prevented me from shipping an anthropic wrapper is their combo context manager streaming setup that makes patching the client less pretty -


with client.messages.stream(
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
    model="claude-3-opus-20240229",
) as stream:
  for text in stream.text_stream:
      print(text, end="", flush=True)
``

As much as possible we'd like to not alter the behavior of this kind of statement

enginoid commented 2 weeks ago

Gotcha – I have tried to avoid this one for now because it did look more involved and I'm not sure I'll be able to commit the time. Is it an option in your view to merge this without support for the stream API?

Separately, I'd love a bit of guidance from you on how to structure the outputs, since I'm not sure what the target schema is for it to look and work right in LangSmith, eg. to show the right output from a tool call in "Outputs". I've shared a Loom here and would love it if you could take a look - then I can change the reducer to get the outputs consistent.

enginoid commented 1 day ago

We're using this internally and it's working fine, except that we have to use "Raw output" rather than "Output" to see the outputs. I'd love to help get this merged upstream but I have to take it off my plate for now. If someone can help with the stuff I posted in the last comment at some point, then feel free to re-engage me and I can see if I can find some capacity to drive it home.

langchain-ai / langsmith-sdk

[WIP] add `wrap_anthropic` #789

TODOs before leaving draft stage

Thoughts/caveats