Open samuelcolvin opened 1 week ago
Thanks @frankie567 we'll look into this, I think @alexmojaki has something similar working until we can support that syntax in datafusion.
I've created https://github.com/apache/datafusion/issues/12533, and commented on https://github.com/apache/datafusion/issues/11732. Once I get some guidance on next steps, I'll try to work on it.
Same here I have a query using WITHIN thet fails, here getting slowest SQL ordering by P90
WITH query_durations AS (
SELECT
attributes->>'db.statement' as query,
EXTRACT(EPOCH FROM (end_timestamp - start_timestamp)) * 1000 AS duration_ms
FROM
records
WHERE
attributes->>'db.system' = 'influxdb'
AND service_name = 'server'
),
p90_durations AS (
SELECT
query,
percentile_cont(0.90) WITHIN GROUP (ORDER BY duration_ms) AS p90_duration_ms
FROM
query_durations
GROUP BY
query
)
SELECT
query,
p90_duration_ms
FROM
p90_durations
ORDER BY
p90_duration_ms DESC
LIMIT 10;
@@frankie567:
I just tried with this new database engine and it's clearly day-and-night in terms of performance. We struggled before to load our dashboards for timespans > 3 hours; now it's able to load 30 days span pretty quickly. Really great move 👏
I've an existing query that's no longer compatible. The goal was to get request durations percentiles. It looks like this:
But DataFusion states it doesn't support
WITHIN GROUP
:I'm pretty sure there is another way to achieve that query but don't have time right now to explore this ☺️
Originally posted by @frankie567 in https://github.com/pydantic/logfire/issues/408#issuecomment-2358768322