pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.19k stars 1.84k forks source link

Add support for Recursive CTEs 🚀 #11154

Open matthewgapp opened 12 months ago

matthewgapp commented 12 months ago

Description

Would love support for recursive CTEs via the rust sql api. My use case requires calculations/columns that recursively depend on themselves over some dimension (think compounding growth or beg and ending bank balances over time). As of now, I have to resort to using duckdb-rs.

Example

WITH RECURSIVE GrowthCTE AS (
      -- Base case
      SELECT 1 as Index, CAST(100 as FLOAT4) AS Value
      UNION ALL
      -- Recursive step
      SELECT
          Index + 1,
          Value * 1.08
      FROM GrowthCTE
      WHERE Index < 10 
  )
  SELECT
      Index,
      Value
  FROM GrowthCTE;

results in the error

Error: ComputeError(ErrString("Recursive CTEs are not supported"))
orlp commented 11 months ago

I don't know if we'll ever support recursive queries natively in Polars.

For what it's worth, you can solve your problem without a recursive query simply using value * 1.08**idx if you used zero-based indexes.

matthewgapp commented 11 months ago

@orlp can you share the rationale? DuckDB supports them and I'm working on adding support in Datafusion.

You're correct, that for the trivial case, it can be reduced to a non-recursive implementation. But many problems can't be described without iterating/recursing on previous values. New values must be produced from the result of the set of previous values. Think beg and ending bank balances over time.

orlp commented 11 months ago

@matthewgapp DuckDB supports them because they're a SQL-based database implementing the SQL standard. Polars just offers a SQL front-end and is not inherently SQL based.

I just don't think that recursive queries make a lot of sense in a dataframe library. Perhaps someday we'll support something similar to SQL/PGQ in Polars for the graph use-case, but in my experience everything to do around recursive queries is a massive burden to developer time to ultimately end up with something that performs incredibly sub-par as to what the query actually wants to answer. Some hyper-specific forms can be optimized by a query optimizer but that leads to massive performance cliffs when your query deviates even slightly from such a form.

zhangyt26 commented 10 months ago

what do you mean duckDB is sql based but polars is data frame based? sql vs dataframe, they are the same to me.

for the recursion case, for duckDB, i believe you can still expose their recursion logical operator through their relational API and their relational API can give you a dataframe like programming experience.

stonebig commented 9 months ago

Even Duckdb is very slow on recursive cte, versus sqlite. Maybe the approach to these sort of problems shall be different for Duckdb/Polars