OpenAI callback is deceiving when used with Azure OpenAI

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Description and Example Code

Langchain seemingly computes token usage and cost for both OpenAI and AzureOpenAI models using OpenAICallbackHandler. However, that relies on the fact that both the APIs retrieve the "complete" name of the called model, which is not the case in Azure OpenAI.

In my subscription I have 3 deployments of gpt-3.5-turbo corresponding to gpt-35-turbo-0613, gpt-35-turbo-0312, gpt-35-turbo-1106 and 2 deployments of gpt-4 corresponding to gpt-4-1106-preview and gpt-4-0613. However, when calling them for inference, the model is called, respectively gpt-35-turbo and gpt-4 regardless of the version. Langchain can't compute the correct cost then, despite no warning is thrown. This dictionary here also contains entries that would never be used because of the above, e.g. this one.

from langchain_openai import AzureChatOpenAI

llm1 = AzureChatOpenAI(
        api_version="2023-08-01-preview",
        azure_endpoint="https://YOUR_ENDPOINT.openai.azure.com/",
        api_key="YOUR_KEY",
        azure_deployment="gpt-35-turbo-0613",
        temperature=0,
)

llm2 = AzureChatOpenAI(
        api_version="2023-08-01-preview",
        azure_endpoint="https://YOUR_ENDPOINT.openai.azure.com/",
        api_key="YOUR_KEY",
        azure_deployment="gpt-35-turbo-0312",
        temperature=0,
)

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]

llm1.invoke(messages).response_metadata['model_name'] # gpt-35-turbo
llm2.invoke(messages).response_metadata['model_name'] # gpt-35-turbo

System Info

Not applicable here.

langchain-ai / langchain