open-telemetry / opentelemetry-python-contrib

OpenTelemetry instrumentation for Python modules
https://opentelemetry.io
Apache License 2.0
659 stars 543 forks source link

HttpX instrumentation does not work on classes that extend httpx client #2364

Open jeremydvoss opened 3 months ago

jeremydvoss commented 3 months ago

Becuase the HttpX instrumentation changes the httpx.client class, it does not work on classes that are defined on import (even if the class is only instantiated after instrumentation). This means that as soon as OpenAI created an extension of the HttpX client, the httpx instrumentation stopped working. This could be fixed in OpenAI by defining the class at runtime:

class SyncAPIClient(BaseClient[httpx.Client, Stream[Any]]):
    ...
    def __init__(
        ...
    ) -> None:
        ...
        # Define at runtime
        class SyncHttpxClientWrapper(httpx.Client):
            def __del__(self) -> None:
                try:
                    self.close()
                except Exception:
                    pass
        self._client = http_client or SyncHttpxClientWrapper(
            base_url=base_url,
            # cast to a valid type because mypy doesn't understand our type narrowing
            timeout=cast(Timeout, timeout),
            proxies=proxies,
            transport=transport,
            limits=limits,
            follow_redirects=True,
        )

This could also be solved by instrumenting httpx even before importing any library that uses httpx. However, I think these restrictions mean that the HttpX instrumentation is too fragile. We need to improve it so that it works intuitively for all such scenarios.

Describe your environment Windows opentelemetry-api 1.23.0 opentelemetry-instrumentation 0.44b0 opentelemetry-instrumentation-httpx 0.44b0 opentelemetry-instrumentation-openai 0.14.1 opentelemetry-sdk 1.23.0 opentelemetry-semantic-conventions 0.44b0 opentelemetry-semantic-conventions-ai 0.0.23 opentelemetry-util-http 0.44b0

Steps to reproduce

from openai import OpenAI # 1.x
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
import httpx
from opentelemetry.instrumentation.openai import OpenAIInstrumentor

HTTPXClientInstrumentor().instrument()

url = "https://www.example.org/"
with httpx.Client() as client:
     response = client.get(url)

OpenAIInstrumentor().instrument()

client = OpenAI() # 1.x
completion = client.chat.completions.create( # 1.x
  model="gpt-3.5-turbo",
  messages=[
    {"role": "system", "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."},
    {"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
  ]
)
print(completion.choices[0].message)

input()

What is the expected behavior? api.openai.com POST should be captured. Note that this is separate from the openai.chat span captured by the openai instrumentation.

What is the actual behavior? Only the httpx example span and openai.chat spans are collected.

Additional context Add any other context about the problem here.

hbibel commented 3 months ago

I believe one solution could be to monkeypatch the httpx.BaseClient.event_hooks property instead of overwriting the httpx.Client class. That would be a pretty fundamental change in the httpx instrumentation however. I'd like to try this out.

jeremydvoss commented 3 months ago

Aaron noted that the httpx instrumentation isn't using wrapt. I'll experiment with that change and see if that improves this.

jeremydvoss commented 3 months ago

Would be good to update the sphinx docs and/or readme

aabmass commented 3 months ago

@jeremydvoss somewhat related: https://docs.python.org/3/library/unittest.mock.html#where-to-patch describes monkey patches not taking effect for tests. Looking at the httpx instrumentation code, we are very naiively monkey patching the httpx module https://github.com/open-telemetry/opentelemetry-python-contrib/blob/37aba928d45713842941c7efc992726a79ea7d8a/instrumentation/opentelemetry-instrumentation-httpx/src/opentelemetry/instrumentation/httpx/__init__.py#L569-L570

Which means like you mentioned, anyone who uses a from httpx import ... import before it is instrumented will still have a reference to the unpatched version. I think many of our other instrumentations use wrapt to patch the actual implementation, i.e. monkey patching the class's methods instead of the python module's properties.

As a general principle for instrumentation taking effect, that seems better (maybe we can jot this down somewhere). I'm definitely open to re-implementing the pathcing to make this more robust.

hbibel commented 3 months ago

Using wrapt seems like a better approach than what I propsed. Please let me know if you need any help.

WillDaSilva commented 2 months ago

This is also an issue for the HTTPX clients defined by Authlib: https://github.com/lepture/authlib/blob/master/authlib/integrations/httpx_client/oauth2_client.py

nabheet commented 5 days ago

OMG! I just ran into this issue!!! Luckily in our QA environment!

here is a simple code sample, in case it helps:

import httpx
from opentelemetry.instrumentation.httpx import HTTPXClientInstrumentor
import asyncio
from authlib.integrations.httpx_client import AsyncOAuth2Client

async def test():
    hci = HTTPXClientInstrumentor()
    hci.instrument()

    scope = "openid email"
    AsyncOAuth2Client(
        client_id="ABC",
        client_secret="DEF",
        scope=scope,
        redirect_uri="",
    )

r = asyncio.run(test())

print(r)

Here is the exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "/home/nabheet/.vscode-server/extensions/ms-python.debugpy-2024.8.0-linux-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/home/nabheet/.vscode-server/extensions/ms-python.debugpy-2024.8.0-linux-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/nabheet/.vscode-server/extensions/ms-python.debugpy-2024.8.0-linux-arm64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/nabheet/.vscode-server/extensions/ms-python.debugpy-2024.8.0-linux-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nabheet/.vscode-server/extensions/ms-python.debugpy-2024.8.0-linux-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/nabheet/.vscode-server/extensions/ms-python.debugpy-2024.8.0-linux-arm64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/workdir/src/test.py", line 20, in <module>
    r = asyncio.run(test())
        ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/workdir/src/test.py", line 12, in test
    AsyncOAuth2Client(
  File "/workdir/.venv/lib/python3.11/site-packages/authlib/integrations/httpx_client/oauth2_client.py", line 65, in __init__
    httpx.AsyncClient.__init__(self, **client_kwargs)
  File "/workdir/.venv/lib/python3.11/site-packages/opentelemetry/instrumentation/httpx/__init__.py", line 514, in __init__
    super().__init__(*args, **kwargs)
    ^^^^^^^
TypeError: super(type, obj): obj must be an instance or subtype of type