open-telemetry / opentelemetry-lambda

Create your own Lambda Layer in each OTel language using this starter code. Add the Lambda Layer to your Lamdba Function to get tracing with OpenTelemetry.
https://opentelemetry.io
Apache License 2.0
272 stars 166 forks source link

Support for urllib3 auto-instrumentation #214

Closed ericksoen closed 1 week ago

ericksoen commented 2 years ago

Question: Are there any plans to auto-instrument URLLib3 requests as part of the Lambda layer? If not, are there other alternatives we should consider to instrument network requests made via URLLib3?

Context The Python 3.8 lambda layer arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-python38-ver-1-7-1:1 appears to auto-instrument both boto and requests (I do not have application code that uses any of the other listed dependencies in requirements-no-deps.txt.

I have some application code that uses botocore.AWSRequest to make requests (which in turn uses urllib3). I have a functional example that runs locally that auto-instruments these network requests, but the requests are not instrumented when I invoke via Lambda:

The code sample below works locally and produces the following output in our vendor UI:

image

works-locally.py

import os
import json
import requests
import urllib3

from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
    OTLPSpanExporter,
)
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.instrumentation.urllib3 import URLLib3Instrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor

from botocore.awsrequest import AWSRequest
from botocore.httpsession import URLLib3Session

span_exporter = OTLPSpanExporter(
    endpoint = os.getenv("OTLP_EXPORT_ENDPOINT"),
    headers = os.getenv('OTLP_EXPORT_HEADERS')
)
tracer_provider = TracerProvider()
trace.set_tracer_provider(tracer_provider)
span_processor = BatchSpanProcessor(span_exporter)
tracer_provider.add_span_processor(span_processor)

tracer = trace.get_tracer(__name__)

URLLib3Instrumentor().instrument()
RequestsInstrumentor().instrument()

url = "https://jsonplaceholder.typicode.com/todos/10"

with tracer.start_as_current_span('requests-lib'):
    resp = requests.get(url)
    print(f"Request lib response = {resp}")

with tracer.start_as_current_span('botocore-awsrequests'):
    http_session = URLLib3Session()
    request = AWSRequest(method='GET', url = url).prepare()
    raw_response = http_session.send(request)
    response = json.loads(resp._content)   
    print(f"AWS Request response = {response}") 

with tracer.start_as_current_span('urllib3'):
    http = urllib3.PoolManager()
    r = http.request('GET', url)
    print(f"urllib3 response = {r}")

When we invoke similar application via AWS Lambda, we do not see the child spans for the urllib3 and AWSRequest invocations:

lambda.py

import json
from botocore.awsrequest import AWSRequest
from botocore.httpsession import URLLib3Session
from opentelemetry import trace

import requests
import urllib3

tracer = trace.get_tracer(__name__)

def handler(event, context):

    url = "https://jsonplaceholder.typicode.com/todos/10"

    with tracer.start_as_current_span('requests-lib'):
        resp = requests.get(url)
        print(f"Request lib response = {resp}")

    with tracer.start_as_current_span('botocore-awsrequests'):
        http_session = URLLib3Session()
        request = AWSRequest(method='GET', url = url).prepare()
        raw_response = http_session.send(request)
        response = json.loads(resp._content)   
        print(f"AWS Request response = {response}") 

    with tracer.start_as_current_span('urllib3'):
        http = urllib3.PoolManager()
        r = http.request('GET', url)
        print(f"urllib3 response = {r}") 

    return {
        'statusCode': 200,
        'statusDescription': '200 OK',
        'isBase64Encoded': False,
        'headers': {
            'Content-Type': "text/plain",
        },
        'body': "hello world!",
    }

image

We tested configuring URLLib3Instrumentor directly in our Lambda code (see below).

from opentelemetry.instrumentation.urllib3 import URLLib3Instrumentor
URLLib3Instrumentor.instrument()

With this method, we see the following error in our application logs:

"errorMessage": "Unable to import module 'otel_wrapper': No module named 'opentelemetry.instrumentation.urllib3'",
NathanielRN commented 2 years ago

Hey! Thanks for raising this issue with a detailed explanation 😄

There's no reason not to add it! This list was added a long time ago last year when there was no urllib3 Instrumentation package. That's the great thing about Open Source though because the community has since got the package out 🙂

The short answer is yes we can totally add it, so a PR to add it to that file would be very welcome!

As a side note, the arn:aws:lambda:us-east-1:901920570463:layer:aws-otel-python38-ver-1-7-1:1 you referenced is built from the "downstream" AWS Distro for Opentelemetry Lambda Repository. It does not have the urllib3 instrumentation package and that's why you see that error message. If you get it merged here in this repo, because ADOT Lambda takes the latest changes from this repo, you'll see it be made available in the next release 🙂

In the meantime if you want to unblock yourself, you can download that package separately, create another Lambda layer, and publish it as a Layer yourself to your own account. Then you can add it as a 2nd layer to your Lambda function along the ADOT Lambda one you already added. You just need to have the right folder structure to make sure the Lambda Python runtime finds the package. This is explained in the Lambda Docs.

ericksoen commented 2 years ago

Awesome! I'm happy to open a PR to include this change. If I understand the flow correctly, it sounds like once the PR is merged into this repo, the next release of the downstream ADOT Lambda should auto-instrument requests made using UrlLib3. We're still in the exploration phase so we can likely work around this limitation until these changes to merge.

One behavior I don't yet understand from the various GitHub repos (and would love to understand prior to submitting a PR) is how the .Instrument() function for the dependencies listed in requirements-no-deps.txt is invoked.

For this PR, is there any testing or validation that's expected beyond adding the dependency with the appropriate version to requirements-no-deps.txt (first time contributor here so trying to get the lay of the land 😁)

NathanielRN commented 2 years ago

once the PR is merged into this repo, the next release of the downstream ADOT Lambda should auto-instrument requests made using UrlLib3

Yup you've understood correctly.

how the .Instrument() function for the dependencies listed in requirements-no-deps.txt is invoked.

For this I'll refer you to our detailed ADOT Docs on Manually Instrumenting a Python application or otherwise OTel Python examples on manual instrumentation.

Basically .Instrument() registers a class or package to be instrumented by wrapping or "monkey-patching" important functions in that package.

is there any testing or validation that's expected

Unfortunately we don't have testing on this repo yet because by being vendor-agnostic we have not developed a full CI/CD with a remote backend to test the traces 😓 However in ADOT Lambda we do have the full integration suite. This is a small change so we can decide whether we want it here with community input first and then the ADOT Lambda team will make sure the customer experience is as expected for users of their layers.

However this brings up an important part that I neglected to mention. OTel is a great project, but AWS Lambda is an environment where size is crucial and size is limited. AWS Lambda has a hard limit of 250 MB for a deployment package size which is why in general the layers should be kept lightweight because they take from that limit. Even with all that said instrumentation packages are very lightweight already (urllib3 is only ~64KB 😅) so shouldn't be a problem to add it 🙂

Finally, may I ask what is your use case for using urllib3? The requests library uses urllib3 underneath, and is generally much more preferred like in this SO post. requests is also already instrumented. I noticed in your code you definitely know about this package:

    with tracer.start_as_current_span('requests-lib'):
        resp = requests.get(url)
        print(f"Request lib response = {resp}")

    with tracer.start_as_current_span('urllib3'):
        http = urllib3.PoolManager()
        r = http.request('GET', url)
        print(f"urllib3 response = {r}") 

Which shows that you can accomplish the same thing with requests. urllib3 is a powerful library used underneath by other libraries like the AWS SDK boto3 library itself. If that makes sense we wouldn't have to add the urllib3 instrumentation package, so it's about what serves the community's use-cases the best 🙂 Let me know what you think!

ericksoen commented 2 years ago

Re: urllib3. At this point pragmatism. One of the application teams I support implements it instead of requests. From a purely technical perspective, I didn't see any blockers to migrate their application code to use the more preferred requests lib (my personal pref for new development).

Given the constraints you've pointed out on Lambda layer size (and the need to support urllib3 indefinitely once its included), I'm comfortable with deferring the inclusion of urllib3 until there's a wider community need for it. Are you comfortable with that direction as well?

NathanielRN commented 2 years ago

Also agreed! Users can +1 this issue if they feel like this is a missing gap. Especially without tests let's rely on community interest to guide new features like this 🙂

github-actions[bot] commented 2 months ago

This issue was marked stale. It will be closed in 30 days without additional activity.

github-actions[bot] commented 1 week ago

Closed as inactive. Feel free to reopen if this issue is still relevant.