readysettech / readyset

Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the hood, ReadySet caches the results of cached select statements and incrementally updates these results over time as the underlying data changes.
https://readyset.io
Other
4.54k stars 125 forks source link

Cannot migrate query that contains a function over a join column #1372

Open jasobrown-rs opened 2 months ago

jasobrown-rs commented 2 months ago

Description

When testing readyset with benchbase's tpc-h benchmark, several queries contain a join condition, expressed in the where clause that causes a peculiar error when at least one of the columns is wrapped with a function.

Here is the query

create cache from
          SELECT c_id,
              c_last,
              sum(ol_amount) AS revenue,
              c_city,
              c_phone,
              n_name
              FROM customer,
              oorder,
              order_line,
              nation
              WHERE c_id = o_c_id
              AND c_w_id = o_w_id
              AND c_d_id = o_d_id
              AND ol_w_id = o_w_id
              AND ol_d_id = o_d_id
              AND ol_o_id = o_id
              AND o_entry_d >= '2007-01-02 00:00:00.000000'
              AND o_entry_d <= ol_delivery_d
              AND n_nationkey = ascii(substring(c_state from  1  for  1))
              GROUP BY c_id,
              c_last,
              c_city,
              c_phone,
              n_name
              ORDER BY revenue DESC;

Here is the error, as seen in the mysql client (carriage returns added by me):

ERROR 1105 (HY000): Error during RPC (extend_recipe (in readyset-client/src/controller/rpc.rs:49:26)): 
   SQL SELECT query 'q_6565248d6c10a8b' couldn't be added: MIR node '18' couldn't be lowered to dataflow: 
   Internal error: in readyset-server/src/controller/mir_to_flow.rs:45:5: 
    invariant failed: columns.len() == names.len();

The problem occurs because of this where predicate (which is a join condition):

AND n_nationkey = ascii(substring(c_state from  1  for  1))

Additional data points:

Thus, i think it is because at least one of the join condition's columns is wrapped in a function.

Steps to reproduce

create table user (id int, dept_name varchar(32));
create table dept (dept_id int, name varchar(32));
create cache from select user.id, dept.dept_id 
    from user, dept 
    where ascii(user.dept_name) = dept.name;

ReadySet version

git sha b647d97e3f1a0dfbe47168d0814a67f003066bb6

cli command

cargo --locked run --bin readyset -- \
   --upstream-db-url=mysql://root:noria@127.0.0.1:3306/benchbase --feature-post-lookup