pydantic / logfire

Uncomplicated Observability for Python and beyond! 🪵🔥
https://docs.pydantic.dev/logfire/
MIT License
1.67k stars 45 forks source link

Deferred otel control #157

Open ManiMozaffar opened 2 months ago

ManiMozaffar commented 2 months ago

Description

I named the feature as "Deferred otel control", because I want to control the otel process and flush at some point in my program, to defer it if I want to gain the best performance in a defined time frame where my software is very time and performance sensitive. In simpler term, some software, in my case a crawler that was tasked to do something quickly, can be time-intensive. That means there's a point in my software, where I want full speed as possible. my case was more IO intensive as well. I wasn't doing much CPU. And 10ms makes alot difference.

At first I found this might be useless and not a common issue, but here's the discussion with @dmontagu who was also convinced this could be quite useful. You can also access more details from reading this thread.

To show my idea in term of coding:


def fn():
   with deferred_otel():
       # whatever happens here, otel sdks should defer the operation to afterward
        execute_fast()
        # i need all telemetry data from execute fast, but just deferred and flushed later
        execute_fast()
    # good breathing point
    # idc about speed here
    ...

Collecting the data for spans and logs can have significant overhead, but should I avoid logfire entirely if performance is highly time-sensitive when in fact that period only happens in small part of my code? In principle that in the logfire SDK we could buffer spans for processing/sending before even handing them off to OTel, rather than doing a lot of the manipulation eagerly

adriangb commented 2 months ago

Do you want the telemetry from within execute_fast() just later or would you be okay with just not having any telemetry? If the latter then something like this might work:

with logfire.sample(0.01):  # only emit 1% of telemetry
    execute_fast()
# back to 100% sampling
ManiMozaffar commented 2 months ago

@adriangb I need 100% of telemetry data from execute_fast so that solution doesn't help, I need it to be deferred to after a good breathing point.