trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.28k stars 2.96k forks source link

Query taking longer time in the Planning Phase #22358

Open vijayakumarpattanashetti opened 3 months ago

vijayakumarpattanashetti commented 3 months ago

Issue: The query takes x time sometimes, and most of the times it takes 2x time. In less probability the query also fails with optimiser timeout error.

Observations:

  1. Prometheus: The memory, cpu & network metrics seems all good.
  2. Trino Admin UI: The query takes almost 3m for planning, and stays about 2m in Starting, and Execution takes about <2m. The query planning fails if the planning time is exceeding 3m with Optimiser timeout error.
  3. EXPLAIN ANALYZE: The Dynamic filters & pushdown is effective.

Data Details:

  1. The data is on GCS, and in ORC format.
  2. The data size is ~20GB/day.

Query Details: The query involves about 2 Full JOINs, 3 IN clause filters with list of length 26, and almost 7-level subqueries.

Query Format Example:

SELECT a
FROM
  (
   SELECT b
   FROM
     (
      SELECT c
      FROM
        (
         SELECT d
         FROM
           (
            SELECT e
            FROM
              (((
               SELECT f
               FROM
                 (
                  SELECT g
                  FROM
                    (
                     SELECT h
                     FROM table0
                     WHERE (equals AND 8-INs)
                     GROUP BY 5-cols
                  )
                  WHERE col_a IS NOT NULL
                  GROUP BY 3-cols
                  ORDER BY 3-cols
               )
            )
            FULL JOIN (
               SELECT i
               FROM
                 (
                  SELECT j
                  FROM
                    (
                     SELECT k
                     FROM
                       (
                        SELECT l
                        FROM table0
                        WHERE (equals AND 8-INs)
                        GROUP BY 7-cols
                     )  t1
                     GROUP BY 5-cols
                     ORDER BY 5-cols
                  )
               )
               GROUP BY 3-cols
            ) ON (3-equals-ANDs))
            FULL JOIN (
               SELECT m
               FROM
                 (
                  SELECT n
                  FROM
                    (
                     SELECT o
                     FROM
                       (
                        SELECT p
                        FROM table0
                        WHERE (equals AND 8-INs)
                        GROUP BY 7-cols
                     )  t1
                     GROUP BY 5-cols
                     ORDER BY 5-cols
                  )
               )
               GROUP BY 3-cols
            ) ON (3-equals-ANDs))
         )
      )
      WHERE (col_a IS NOT NULL)
   )
)
GROUP BY col_a
hashhar commented 3 months ago

Please grab the query json for this query - that should show breakdown of much time each optimizer rule or analysis or metadata fetching took.