traceloop / openllmetry

Open-source observability for your LLM application, based on OpenTelemetry
https://www.traceloop.com/openllmetry
Apache License 2.0
5.11k stars 676 forks source link

πŸ› Bug Report: Not getting token counts for Databricks (both foundation & external models) #1297

Closed tkanhe closed 4 months ago

tkanhe commented 5 months ago

Which component is this bug for?

Langchain Instrumentation

πŸ“œ Description

Databricks supports the OpenAI Client for querying LLM models (foundation and external models). I am using it with Langchain. I am getting the traces but not the token count.

Ref. https://docs.databricks.com/en/machine-learning/model-serving/score-foundation-models.html

πŸ‘Ÿ Reproduction steps

Code:

import asyncio

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from langchain.callbacks import AsyncIteratorCallbackHandler
from langchain.chains.question_answering import load_qa_chain
from langchain.docstore.document import Document
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
from traceloop.sdk import Traceloop
from traceloop.sdk.decorators import workflow

from prompt import prompt_template, variables

Traceloop.init(app_name="tk test", disable_batch=True, api_endpoint="http://localhost:4318")

app = FastAPI()

class Message(BaseModel):
    content: str

@workflow(name="send_message")
async def send_message(question: str):
    callback = AsyncIteratorCallbackHandler()

    model = ChatOpenAI(
        model_name="databricks-dbrx-instruct",
        api_key="dapi51b9336b04**********************e",
        base_url="https://dbc-***********.cloud.databricks.com/serving-endpoints",
        streaming=True,
        callbacks=[callback],
    )

    chain = load_qa_chain(model, chain_type="stuff", prompt=prompt_template, verbose=False)

    task = asyncio.create_task(chain.ainvoke({"input_documents": [Document(page_content=variables["context"])], "question": question}))

    try:
        async for token in callback.aiter():
            yield token
    except Exception as e:
        print(f"Caught exception: {e}")
    finally:
        callback.done.set()
    await task

@app.post("/stream_chat")
def stream_chat(message: Message):
    generator = send_message(message.content)
    return StreamingResponse(generator, media_type="text/event-stream")

πŸ‘ Expected behavior

It should give the token count along with the traces...

πŸ‘Ž Actual Behavior with Screenshots

Traces I'm getting:

{
        "Timestamp": datetime.datetime(2024, 6, 11, 11, 41, 7, 762861),
        "TraceId": "81d12193976454e2d9bdbdd0944c3c06",
        "SpanId": "aba567ee86a445e8",
        "ParentSpanId": "55c0ac37410e2496",
        "TraceState": "",
        "SpanName": "openai.chat",
        "SpanKind": "SPAN_KIND_CLIENT",
        "ServiceName": "llm",
        "ResourceAttributes": {"service.name": "llm"},
        "ScopeName": "opentelemetry.instrumentation.openai.v1",
        "ScopeVersion": "0.22.0",
        "SpanAttributes": {
            "traceloop.association.properties.endpoint_id": "6605069e8357cc62d841c9cd",
            "gen_ai.system": "OpenAI",
            "llm.is_streaming": "true",
            "gen_ai.prompt.0.role": "user",
            "gen_ai.completion.0.finish_reason": "stop",
            "traceloop.association.properties.node_id": "NA",
            "gen_ai.response.model": "dbrx-instruct-032724",
            "llm.headers": "None",
            "traceloop.association.properties.node_label": "NA",
            "gen_ai.prompt.0.content": "You are an AWS expert. Use the provided context related AWS re:Invent content to gather detailed information about AWS re:Invent launches. If a question is not related to the topic of AWS re:Invent, refrain from answering it. Instead, encourage the inquirer to pose questions that are relevant to AWS re:Invent topics.\n\nWhen responding to user queries, particularly those asking for statistical data or specific details about the launches, extract and summarize the relevant information. The response should be concise, accurate, and directly address the user's question.\n\nFor example, if a user asks, 'What are all the AI/ML related launches from re:Invent?', the system should:\n\nExtract key details about each AI/ML launch, such as the name of the service or feature, its purpose, and any significant attributes or innovations it brings.\nCompile this information into a coherent, comprehensive summary that answers the user's query clearly.\nAdditionally, ensure the information is up-to-date and reflects the latest re:Invent announcements. In cases where the query is ambiguous or too broad, request more specific details from the user to refine the search and provide the most relevant answer.\n###\nCONTEXT: Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can easily experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources. Since Amazon Bedrock is serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services.\n###\nQUESTION: what is aws ?\n###  \n",
            "gen_ai.completion.0.content": "AWS, or Amazon Web Services, is a comprehensive cloud computing platform offered by Amazon. It includes a variety of services, such as computing power, storage options, networking, and databases, that can be used to run applications and services in the cloud. AWS offers a broad set of tools and services, including those related to AI/ML, data analytics, and security, that can help organizations of all sizes scale and grow their operations.\n\nFor more information about AWS re:Invent launches related to AI/ML, please refer to the context provided in the prompt.",
            "traceloop.association.properties.request_id": "1718106066413861406",
            "traceloop.association.properties.service": "get_llm_chain_streaming",
            "gen_ai.request.model": "databricks-dbrx-instruct",
            "gen_ai.request.temperature": "0",
            "gen_ai.openai.api_base": "https://dbc-*********.cloud.databricks.com/serving-endpoints/",
            "gen_ai.completion.0.role": "assistant",
            "llm.request.type": "chat",
        },
        "Duration": 1987758731,
        "StatusCode": "STATUS_CODE_OK",
        "StatusMessage": "",
        "Events.Timestamp": [
            datetime.datetime(2024, 6, 11, 11, 41, 8, 57539),
            datetime.datetime(2024, 6, 11, 11, 41, 8, 68653),
            datetime.datetime(2024, 6, 11, 11, 41, 8, 83035),
....
            datetime.datetime(2024, 6, 11, 11, 41, 9, 721325),
            datetime.datetime(2024, 6, 11, 11, 41, 9, 738651),
        ],
        "Events.Name": [
            "llm.content.completion.chunk",
            "llm.content.completion.chunk",
            "llm.content.completion.chunk",
            "llm.content.completion.chunk",
....
        ],
        "Events.Attributes": [
            {},
            {},
            {},
            {},
.....
        ],
        "Links.TraceId": [],
        "Links.SpanId": [],
        "Links.TraceState": [],
        "Links.Attributes": [],
    }

πŸ€– Python Version

3.10

πŸ“ƒ Provide any additional context for the Bug.

traceloop-sdk==0.22.0 langchain==0.1.20 langchain-anthropic==0.1.13 langchain-aws==0.1.6 langchain-community==0.0.38 langchain-core==0.1.52 langchain-openai==0.0.8 opentelemetry-api==1.25.0 opentelemetry-contrib-instrumentations==0.41b0 opentelemetry-distro==0.45b0 opentelemetry-exporter-otlp-proto-common==1.25.0 opentelemetry-exporter-otlp-proto-grpc==1.25.0 opentelemetry-exporter-otlp-proto-http==1.25.0 opentelemetry-instrumentation==0.46b0 opentelemetry-instrumentation-aio-pika==0.41b0 opentelemetry-instrumentation-aiohttp-client==0.41b0 opentelemetry-instrumentation-aiopg==0.41b0 opentelemetry-instrumentation-alephalpha==0.22.0 opentelemetry-instrumentation-anthropic==0.22.0 opentelemetry-instrumentation-asgi==0.46b0 opentelemetry-instrumentation-asyncpg==0.41b0 opentelemetry-instrumentation-aws-lambda==0.41b0 opentelemetry-instrumentation-bedrock==0.22.0 opentelemetry-instrumentation-boto==0.41b0 opentelemetry-instrumentation-boto3sqs==0.41b0 opentelemetry-instrumentation-botocore==0.41b0 opentelemetry-instrumentation-cassandra==0.41b0 opentelemetry-instrumentation-celery==0.41b0 opentelemetry-instrumentation-chromadb==0.22.0 opentelemetry-instrumentation-cohere==0.22.0 opentelemetry-instrumentation-confluent-kafka==0.41b0 opentelemetry-instrumentation-dbapi==0.41b0 opentelemetry-instrumentation-django==0.41b0 opentelemetry-instrumentation-elasticsearch==0.41b0 opentelemetry-instrumentation-falcon==0.41b0 opentelemetry-instrumentation-fastapi==0.46b0 opentelemetry-instrumentation-flask==0.41b0 opentelemetry-instrumentation-google-generativeai==0.22.0 opentelemetry-instrumentation-grpc==0.41b0 opentelemetry-instrumentation-haystack==0.22.0 opentelemetry-instrumentation-httpx==0.41b0 opentelemetry-instrumentation-jinja2==0.41b0 opentelemetry-instrumentation-kafka-python==0.41b0 opentelemetry-instrumentation-langchain==0.22.0 opentelemetry-instrumentation-llamaindex==0.22.0 opentelemetry-instrumentation-logging==0.41b0 opentelemetry-instrumentation-milvus==0.22.0 opentelemetry-instrumentation-mistralai==0.22.0 opentelemetry-instrumentation-mysql==0.41b0 opentelemetry-instrumentation-mysqlclient==0.41b0 opentelemetry-instrumentation-ollama==0.22.0 opentelemetry-instrumentation-openai==0.22.0 opentelemetry-instrumentation-pika==0.41b0 opentelemetry-instrumentation-pinecone==0.22.0 opentelemetry-instrumentation-psycopg2==0.41b0 opentelemetry-instrumentation-pymemcache==0.41b0 opentelemetry-instrumentation-pymongo==0.41b0 opentelemetry-instrumentation-pymysql==0.41b0 opentelemetry-instrumentation-pyramid==0.41b0 opentelemetry-instrumentation-qdrant==0.22.0 opentelemetry-instrumentation-redis==0.41b0 opentelemetry-instrumentation-remoulade==0.41b0 opentelemetry-instrumentation-replicate==0.22.0 opentelemetry-instrumentation-requests==0.46b0 opentelemetry-instrumentation-sklearn==0.41b0 opentelemetry-instrumentation-sqlalchemy==0.46b0 opentelemetry-instrumentation-sqlite3==0.41b0 opentelemetry-instrumentation-starlette==0.41b0 opentelemetry-instrumentation-system-metrics==0.41b0 opentelemetry-instrumentation-together==0.22.0 opentelemetry-instrumentation-tornado==0.41b0 opentelemetry-instrumentation-tortoiseorm==0.41b0 opentelemetry-instrumentation-transformers==0.22.0 opentelemetry-instrumentation-urllib==0.41b0 opentelemetry-instrumentation-urllib3==0.46b0 opentelemetry-instrumentation-vertexai==0.22.0 opentelemetry-instrumentation-watsonx==0.22.0 opentelemetry-instrumentation-weaviate==0.22.0 opentelemetry-instrumentation-wsgi==0.41b0 opentelemetry-propagator-aws-xray==1.0.1 opentelemetry-proto==1.25.0 opentelemetry-sdk==1.25.0 opentelemetry-semantic-conventions==0.46b0 opentelemetry-semantic-conventions-ai==0.3.1 opentelemetry-util-http==0.46b0

πŸ‘€ Have you spent some time to check if this bug has been raised before?

Are you willing to submit PR?

None

nirga commented 4 months ago

Fixed with #1452