open-telemetry / opentelemetry-python

OpenTelemetry Python API and SDK
https://opentelemetry.io
Apache License 2.0
1.76k stars 614 forks source link

[DO NOT MERGE] Port of jaeger remote sampler support #3827

Closed sconover closed 5 months ago

sconover commented 6 months ago

DRAFT - DO NOT MERGE

There are a variety of challenging TODO's (marked TODO(sconover)) and, no doubt, changes to be made based on feedback from project maintainers.

The goal of this initial set of changes is to conduct a pretty "straight" port of the corresponding code in jaeger-client-python, to get discussion going on how opentelemetry-python support for jaeger remote sampling should ultimately work.

Please see the jaeger-client-python project and especially sampler.py and test_sampler.py for the source of much of this port.

New dependencies:

I expect there's a very good chance that opentelemetry python docs will need to be updated in advance of an ultimate merge, and would appreciate advice on what those areas might be.

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Does This PR Require a Contrib Repo Change?

Answer the following question based on these examples of changes that would require a Contrib Repo Change:

Checklist:

I'm leaving all of the above "unchecked". I've made some style changes in the direction of this project and python 3, however I'm hesitant to "compromise" the relatively straight port at this point, to make comparison/contrasting easier for initial reviewers.

Regarding (even) unit tests: while this code is unit tested, the tests are ported, and stylistically are quite different from tests in this project - for example they make extensive use of mocks, among other differences.

Sample flask server and explanation:

# A jaeger agent may be started using a command like:
#
# docker run \
# -v /dev/scratch/config:/config \
# -e SAMPLING_CONFIG_TYPE=file \
# -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
# -p 5775:5775/udp \
# -p 6831:6831/udp \ 
# -p 6832:6832/udp \ 
# -p 5778:5778 \
# -p 16686:16686 \   
# -p 14268:14268 \ 
# -p 14250:14250 \ 
# -p 9411:9411 \
# jaegertracing/all-in-one:1.22 \
# --sampling.strategies-file=/config/strategies.json \
# --sampling.strategies-reload-interval=5s
#
# Where the sampling strategies config file is available locally at
# 
# /dev/scratch/config/strategies.json 
#
#
# Run this python script and make a request using:
#   curl localhost:5000/
# If the sampling decision is 'sample' then the ConsoleSpanExporter
# will print the emitted span to the console.
#
# If the agent's strategies.json files is changed to something like
#
# 
# {
#   "service_strategies": [
#     {
#       "service": "foo",
#       "type": "probabilistic",
#       "param": 0.333
#     }
#   ],
#   "default_strategy": {
#     "type": "probabilistic",
#     "param": 1
#   }
# }
#
# ...once these settings sync over via RemoteControlledSampler,
# it will take several curl invocations to see an emitted span,
# reflecting the 1-in-3 chance that a trace is created.

from opentelemetry.sdk.resources import SERVICE_NAME, Resource

from opentelemetry import trace
from opentelemetry.sdk.trace.export import ConsoleSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleSpanProcessor

from opentelemetry.sdk.trace.sampling import RemoteControlledSampler
from opentelemetry.sdk.trace.local_agent_net import LocalAgentSender

import threading
import tornado

main_loop = tornado.ioloop.IOLoop().current()

sampler = RemoteControlledSampler(
    channel=LocalAgentSender('localhost', 5778, 5778, io_loop=main_loop),
    service_name='foo',
    sampling_refresh_interval = 5,
)

resource = Resource(attributes={
    SERVICE_NAME: "foo"
})
traceProvider = TracerProvider(resource=resource, sampler=sampler)
traceProvider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter("foo")))
trace.set_tracer_provider(traceProvider)

from flask import (
    Flask, 
    jsonify
)

tracer = trace.get_tracer("foo")

def create_app():
    app = Flask(__name__)

    @app.route('/')
    def hello_world(): 
        with tracer.start_as_current_span("foo") as span:
                span.set_attribute("hello.value", "world")
                return jsonify({
                    "status": "success",
                    "message": "Hello World!"
                })

    return app

app = create_app()

if __name__ == '__main__':
    threading.Thread(target=lambda: app.run(debug=True, use_reloader=False)).start()
    main_loop.start()
srikanthccv commented 6 months ago

FYI, contrib is the appropriate place for components like this https://github.com/open-telemetry/opentelemetry-python-contrib. It can't be merged into SDK.

sconover commented 5 months ago

@srikanthccv Thanks for the initial feedback, and I completely agree that it seems a little strange/wrong that this isn't in contrib. However there are a couple of key places where, given the way the code is structured, the code at present, unless I'm missing something, technically must be in here. I will cite the areas I see as issues, and I'd appreciate guidance on what to do about them. I'd be happy to put together a minimal PR to this repo to "prep" the lib for accommodation of the larger portion (all the jaeger remote sampling impl) which I'd put in the contrib repo.

sconover commented 5 months ago

Closing in favor of splitting this into this core PR and this contrib PR per @srikanthccv 's advice