pydantic / logfire

Uncomplicated Observability for Python and beyond! 🪵🔥
https://logfire.pydantic.dev/docs/
MIT License
1.96k stars 59 forks source link

`WITHIN GROUP` query syntax #433

Open samuelcolvin opened 1 week ago

samuelcolvin commented 1 week ago

@@frankie567:

I just tried with this new database engine and it's clearly day-and-night in terms of performance. We struggled before to load our dashboards for timespans > 3 hours; now it's able to load 30 days span pretty quickly. Really great move 👏

I've an existing query that's no longer compatible. The goal was to get request durations percentiles. It looks like this:

WITH duration AS (
  SELECT
    time_bucket('%time_bucket_duration%', start_timestamp) AS x,
    extract(milliseconds from end_timestamp - start_timestamp) as duration
  FROM records
  WHERE otel_scope_name = 'opentelemetry.instrumentation.asgi' and parent_span_id is null
)
SELECT
  x,
  percentile_cont(0.99) WITHIN GROUP (ORDER BY duration) as percentile_99,
  percentile_cont(0.95) WITHIN GROUP (ORDER BY duration) as percentile_95
FROM duration
GROUP BY x
ORDER BY x DESC;

But DataFusion states it doesn't support WITHIN GROUP:

Capture d’écran 2024-09-18 à 17 19 35

I'm pretty sure there is another way to achieve that query but don't have time right now to explore this ☺️

Originally posted by @frankie567 in https://github.com/pydantic/logfire/issues/408#issuecomment-2358768322

samuelcolvin commented 1 week ago

Thanks @frankie567 we'll look into this, I think @alexmojaki has something similar working until we can support that syntax in datafusion.

samuelcolvin commented 1 week ago

I've created https://github.com/apache/datafusion/issues/12533, and commented on https://github.com/apache/datafusion/issues/11732. Once I get some guidance on next steps, I'll try to work on it.

jules-ch commented 1 day ago

Same here I have a query using WITHIN thet fails, here getting slowest SQL ordering by P90



WITH query_durations AS (
    SELECT 
        attributes->>'db.statement' as query, 
        EXTRACT(EPOCH FROM (end_timestamp - start_timestamp)) * 1000 AS duration_ms 
    FROM 
        records
    WHERE 
        attributes->>'db.system' = 'influxdb'
        AND service_name = 'server'
),
p90_durations AS (
    SELECT 
        query, 
        percentile_cont(0.90) WITHIN GROUP (ORDER BY duration_ms) AS p90_duration_ms
    FROM 
        query_durations
    GROUP BY 
        query
)
SELECT 
    query, 
    p90_duration_ms
FROM 
    p90_durations
ORDER BY 
    p90_duration_ms DESC
LIMIT 10;