langchain-ai / langsmith-sdk

LangSmith Client SDK Implementations
https://smith.langchain.com/
MIT License
346 stars 59 forks source link

Issue: Understanding inconsistencies in data recording for LangSmith and LangChain JS/TS #516

Open elliotmrodriguez opened 3 months ago

elliotmrodriguez commented 3 months ago

Issue you'd like to raise.

Hello, I am using LangSmith to evaluate platform tools for cost and performance tracking and I am noticing some inconsistencies I do not understand.

For azure-openai and openai JS/TS packages exported from LangChain, I get inconsistent cost tracking. When I use the azure-openai package, costs are not recorded. Token counts seem to be consistent, but I get no cost information at all. I have seen similar unrecorded costs making AWS Bedrock calls. When I use the openai package, I get costs, but I'm not confident they are accurate.

image

No LLM calls record time to first token either, unless it is a local ChatOllama call. LangSmith reports the other values as "this run did not stream output", which is interesting as I am using LangChain's Ollama exported chat model object.

It doesn't matter if I try this in a RunnableSequence or direct invoke call - Azure calls do not get cost information.

When I inspect traces for these Azure runs, I see the following message when trying to interact with the Playground: image

And OpenAI runs are correctly derived as OpenAI for the purposes of the Playground.

What am I doing wrong?

hinthornw commented 3 months ago

We need to relax the default model-matching rules we have for cost estimation. (I think azure openai returns gpt-35-turbo instead of gpt-3.5-turbo for instance)

You can customize the cost rules yourself if you'd like FYI, in case your pricing doesn't match the default ones

Export-1710273338544

elliotmrodriguez commented 3 months ago

Hi @hinthornw thank you for your reply!

I discovered the list of models as well, and noticed the related regexes, and I've been using the model name specified in my deployment (gpt-35-turbo), which should match to the second regex on the unpinned model name (without a version, at least that is the OpenAI behavior)

image

And this is what the deployments in Azure show for their model names:

image

But it sounds like you are saying adding/cloning the model with a looser regex should address this - because on closer inspection we are using the legacy 0613 version.

Thanks again

elliotmrodriguez commented 3 months ago

Hello @hinthornw, i'm reopening this issue because it doesn't seem as if adding new regexes has helped, at least when using the exported AzureChatOpenAI object from @langchain/azure-openai.

Even for gpt-4, no cost attribute is collected at all. It isn't that they don't match, they simply are not returned at all. I've tried values that regex testers indicate should match, like "gpt-4", "gpt-3.5-turbo", but still they fail to return costs.

const azureChatModel = new AzureChatOpenAI({
    azureOpenAIEndpoint: my-cool-endpoint
    azureOpenAIApiKey: super-secret
    azureOpenAIApiDeploymentName: also-super-secret,
    modelName: "gpt-4" 
  });

For 3.5 turbo, the other params are the same but the modelName argument I am passing, just gpt-3.5-turbo, doesnt appear to work either, at least not for the AzureChatOpenAI object.

is there something else i should be inspecting?

hinthornw commented 2 months ago

Thanks for raising - will pass this on.

If you click on the metadata tab for one of those llm runs, what does it show? Any chance you could share a link to a run to help us debug?

Kniggishood commented 1 month ago

@elliotmrodriguez Hey there, have you perhaps already figured why the stream requests are not tracked properly? I have a related issue: No matter which way I trace my Custom LLM using WrapOpenAI / traceable / RunTree from the Typescript SDK, no first tokens is monitored. Any ideas? :)

hinthornw commented 1 month ago

Oh the typescript SDK may not track that event rn - I'll sync with the owner

MathiasVantieghem commented 2 days ago

I had the same issue. I managed to track cost by configuring a LangSmith model where the rule matches with ls_model_name metadata (which was filled with the deployment name) 😉