trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.34k stars 2.98k forks source link

Named queries (WITH) should be evaluated only once #10

Open martint opened 5 years ago

martint commented 5 years ago

Currently, Trino evaluates named queries (WITH clause) by inlining them wherever they are referenced in the body of the main query. This can cause problems if the query is non-deterministic, since every inlined instance might produce different results.

According to the spec:

1) If a non-recursive <with clause> is specified, then:
    ...
    b) For every <with list element> WLE, let WQN be the <query name> immediately contained in WLE. Let WQE be 
       the <query expression> simply contained in WLE. Let WLT be the table resulting from evaluation of WQE, with 
       each column name replaced by the corresponding element of the <with column list>, if any, immediately contained in WLE.
    c) Every <table reference> contained in <query expression> that specifies WQN identifies WLT.

This says that each named query should be evaluated once and the result "stored" in a table WLT. Thereafter, any references to the named query directly identify that WLT table.

WLT in the definition above is what the spec calls "transient table":

4.15.4 Transient tables A transient table is a named table that may come into existence implicitly during the evaluation of a <query expression> or the execution of a trigger. A transient table is identified by a <query name> if it arises during the evaluation of a <query expression>, or by a <transition table name> if it arises during the execution of a trigger. Such tables exist only for the duration of the executing SQL-statement containing the <query expression> or for the duration of the executing trigger.

rahulroy9202 commented 4 years ago

I am looking forward for this fix too.