open-telemetry / opentelemetry-python-contrib

OpenTelemetry instrumentation for Python modules
https://opentelemetry.io
Apache License 2.0
699 stars 580 forks source link

Instrumentation dbapi is not capturing the odbc database commands #707

Open kosportel opened 3 years ago

kosportel commented 3 years ago

Hello,

I am trying to capture the database commands using Instrumentation DBapi, but i don't see anything to be reported to Jaeger. I am accessing database using pyodbc and pandas and i see while debugging that no exception is occurred and data are returned from the database. I have configured Jaeger in my local docker and I see that I am able to send there data if I use the simple getting started examples:

with tracer.start_as_current_span("foo"):
    with tracer.start_as_current_span("bar"):
        with tracer.start_as_current_span("baz"):
            print("Hello world from OpenTelemetry Python!")

I assume that Instrumentation DBapi is the correct library for pyodbc connections to database (i am using Azure SQL Server). Is there any better library?

I am running Python 3.7.8. 64 bit. Requirements.txt is attached. Keep in mind that these are the libraries of the main - bigger program. I was able to keep only a small part of it - attached below, to reproduce the issue easily.

aniso8601==9.0.1
api==0.0.7
asgiref==3.4.1
attrs==20.3.0
cachetools==4.2.1
certifi==2020.11.8
chardet==3.0.4
click==7.1.2
Deprecated==1.2.12
Flask==1.1.2
Flask-RESTful==0.3.8
flask-restplus==0.13.0
Flask-Testing==0.8.1
grpcio==1.39.0
idna==2.10
importlib-metadata==4.0.0
influxdb==5.3.1
install==1.3.4
itsdangerous==1.1.0
Jinja2==2.11.3
jsonschema==3.2.0
Logentries==0.17
MarkupSafe==1.1.1
msgpack==1.0.2
nose==1.3.7
numpy==1.19.3
opentelemetry-api==1.4.1
opentelemetry-exporter-jaeger==1.4.1
opentelemetry-exporter-jaeger-proto-grpc==1.4.1
opentelemetry-exporter-jaeger-thrift==1.4.1
opentelemetry-exporter-zipkin==1.4.1
opentelemetry-exporter-zipkin-json==1.4.1
opentelemetry-exporter-zipkin-proto-http==1.4.1
opentelemetry-instrumentation==0.23b2
opentelemetry-instrumentation-dbapi==0.23b2
opentelemetry-instrumentation-flask==0.23b2
opentelemetry-instrumentation-requests==0.23b2
opentelemetry-instrumentation-wsgi==0.23b2
opentelemetry-sdk==1.4.1
opentelemetry-semantic-conventions==0.23b2
opentelemetry-util-http==0.23b2
pandas==1.1.4
protobuf==3.17.3
pyodbc==4.0.30
pyrsistent==0.17.3
python-dateutil==2.8.1
pytz==2020.4
PyYAML==5.3.1
requests==2.25.0
six==1.15.0
smart-open==5.0.0
thrift==0.13.0
timeloop==1.0.2
typing==3.7.4.3
typing-extensions==3.7.4.3
urllib3==1.26.2
Werkzeug==0.15.5
wrapt==1.12.1
zipp==3.4.1

requirements_openTelemetry.txt

The code that reproduces the case can be found in the attached python file - just unzip.

import os
from smart_open import open
from flask import Flask, request, jsonify, abort
from flask_restplus import reqparse, Api, Resource, fields
from datetime import datetime
from timeloop import Timeloop
from datetime import timedelta
import pandas as pd
import time

import pyodbc
from opentelemetry import trace
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
from opentelemetry.exporter.zipkin.json import ZipkinExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (BatchSpanProcessor)
from opentelemetry.instrumentation.dbapi import trace_integration

pools_connection_string = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=SERVER;Database=DATABASE_NAME;Uid=USER_NAME;pwd=PASSWORD;"

app = Flask(__name__)

trace.set_tracer_provider(TracerProvider())

jaeger_exporter = JaegerExporter(
    agent_host_name="localhost",
    agent_port=6831,
)

zipkin_exporter = ZipkinExporter(
    endpoint="http://localhost:9411/api/v2/spans"
    )

trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(jaeger_exporter))
trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(zipkin_exporter))

FlaskInstrumentor().instrument_app(app)
RequestsInstrumentor().instrument()

trace_integration(pyodbc, "Connection", "odbc")

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("foo"):
    with tracer.start_as_current_span("bar"):
        with tracer.start_as_current_span("baz"):
            print("Hello world from OpenTelemetry Python!")

connection = pyodbc.connect(pools_connection_string, ansi=True, autocommit=True)
sql = """
    with t as 
    (select  GeoAssetID,
                Grade, 
                PriceAtLastUpdate as BunkerPrice, 
                LastUpdated as PriceDate,
                row_number() over (partition by Grade, GeoAssetID order by LastUpdated desc) as row
    from [Rates] bp with (nolock)
    where GeoAssetID in (3763)
    )
    select * from t where row = 1 order by GeoAssetID
"""
df = pd.read_sql(sql, con=connection)

tl = Timeloop()

tl.job(interval=timedelta(seconds=14400))

tl.start()

if __name__ == '__main__':
    app.run(host='localhost', port=2180, threaded=True)

OpenTelemetry_Test.zip

Steps to reproduce Try to run the code above. You need to adjust the database access and sql command.

What is the expected behavior? Based on my recent experience using OpenTelemetry in .NET, i would expect to see the database command being captured and send information to Jaeger

What is the actual behavior? The database command is ignored.

Can you please advice, if i am doing something wrong or it is a genuine issue?

Thank you in advance, Konstantinos

github-actions[bot] commented 3 years ago

This issue was marked stale due to lack of activity. It will be closed in 30 days.

srikanthccv commented 3 years ago

Does this always happen with connection or is it just with pandas read_sql?

kosportel commented 3 years ago

I changed the code like this but nothing changed.

cursor = connection.cursor()
cursor.tables()
rows = cursor.fetchall()

for row in rows:
    print(row.table_name)

and

cursor = connection.execute(sql)
rows = cursor.fetchall()
for row in rows:
    print(row)

Hope that helps.

adirmatzkin commented 2 years ago

Try replacing trace_integration(pyodbc, "Connection", "odbc") with trace_integration(pyodbc, "connect", "odbc") @kosportel

Seems like a mistake in the docs ... @owais

kosportel commented 2 years ago

Hi @adirmatzkin,

I confirm that changing to "connect", instrumentation works well. I couldn't imagine that such a simple change would solve the problem.

Many Thanks

owais commented 2 years ago

It should be verified if some versions of pyodbc need "Connection" while others need "connect" and the docs should be updated accordingly.