readysettech / readyset

Readyset is a MySQL and Postgres wire-compatible caching layer that sits in front of existing databases to speed up queries and horizontally scale read throughput. Under the hood, ReadySet caches the results of cached select statements and incrementally updates these results over time as the underlying data changes.
https://readyset.io
Other
4.3k stars 120 forks source link

Columns in prepare responses and execute responses don't match when there's an ALTER TABLE that hasn't yet been replicated #100

Open nickelization opened 1 year ago

nickelization commented 1 year ago

Found a failing test case via the DDL Vertical test suite, which appears to be caused by a bug in ReadySet:

thread 'run_cases' panicked at 'Test failed: index out of bounds: the len is 2 but the index is 2;

We then get a minimal failing case that looks like:

WriteRow(("_x__rC_oX1Duc", 0, <Text("4)<xDk$Cશ䦕Ⱥⴧ")>))
AddColumn(("_x__rC_oX1Duc", ColumnSpec { name: "___9vk_XU", sql_type: Real }))```

What appears to happen is that we run `ALTER TABLE ... ADD COLUMN ...` and then immediately run the postcondition where we check that the ReadySet table contents eventually converge and match what we see in Postgres. However, the Postgres client code panics while decoding the query results from ReadySet, so we never get to the point of comparing the table contents (and potentially retrying).

Since the client library itself was crashing, my suspicion was that we may be sending back invalid packets, and after digging into both the client code and the packet data, I was able to confirm this hunch. In this scenario, ReadySet first sends a `RowDescription` message that describes three columns (reflecting the result of having applied the `ALTER TABLE`) but then immediately follows it with a `DataRow` message that only contains two fields. This confuses the client code, which assumes the number of fields in `DataRow` will match up with the number of fields described in `RowDescription`, and results in the `index out of bounds` panic we see above.

From SyncLinear.com | REA-2216

nickelization commented 1 year ago

I don’t think Gautam was suggesting we reorder the response packets, but rather to buffer the prepare response until the execute response has been generated, so that if they don’t match up we have a chance to fix things before we start to respond to the client.

That is accurate, but the suggestion does not work for the reasons that Nick and Girffin mention.

Given that, we might be able to get away with just returning an error to the client here.

Agreed. Its not perfect, but it might be okay.

nickelization commented 1 year ago

interestingly if when connected directly to postgres you:

  1. Prepare SELECT * FROM t;
  2. Run ALTER TABLE t ADD COLUMN y INT;
  3. Execute the prepared statement

you get:

ERROR: cached plan must not change result type

Given that, we might be able to get away with just returning an error to the client here.

nickelization commented 1 year ago

{quote}That said, still not sure if that suggestion is feasible, since I would assume the client would not send the execute request to us at all until after we’ve sent the prepare response.{quote}

well, more importantly prepares and executes are not 1:1 - a single prepared statement can be executed any number of times, and the alter table can be replicated at any time

nickelization commented 1 year ago

An execute succeeding once gives no guarantee if it will succeed later (specifically, once the ALTER TABLE is in fact replicated, the view will get dropped and the execute will fail)

nickelization commented 1 year ago

Can we run our own auto-execute as a test after the prepare?

nickelization commented 1 year ago

I don’t think Gautam was suggesting we reorder the response packets, but rather to buffer the prepare response until the execute response has been generated, so that if they don’t match up we have a chance to fix things before we start to respond to the client.

That said, still not sure if that suggestion is feasible, since I would assume the client would not send the execute request to us at all until after we’ve sent the prepare response. But, then again, I didn’t think I noticed any client requests coming in between us sending the RowDescription and DataRow packets, so my current mental model feels a little fuzzy to me right now.

nickelization commented 1 year ago

We can’t change anything about the ordering of messages, since they’re specified by the client protocol used by the databases.

nickelization commented 1 year ago

<\~accountid:609236b4b9ac3a007151a40b> can we treat prepares and the executes as a transaction of sorts? So prepare response can only be sent after the execute is also completed?

nickelization commented 1 year ago

prepares can't be “redone” - we've already sent the response to the client

nickelization commented 1 year ago

After an ALTER table has been replicated to ReadySet, can we redo any prior prepare statments that we had done, and then of course, do the subsequent execution of the prepare result.

nickelization commented 1 year ago

The issue here is not that we’re returning a stale value, but that the actual schema of our results differs from the schema of the prepare response, which breaks all clients. That’d be the case, to varying degrees, in all of my proposed approaches

nickelization commented 1 year ago

So, in number 3, is that covered by our notion of eventual consistency? We would return a stale value, but that would be ok?

nickelization commented 1 year ago

Ok, here's what's going on here:

It's not clear what the right thing to do is here - ideally, we'd like to be able to detect that the upstream and readyset prepare responses are different, and make that particular statement upstream only - but since we frequently return different column names and types than upstream (which is a large, but very difficult to solve issue) we can't actually reliably do that with total correctness. So we're left with a few options:

  1. Do nothing. Queries made immediately after an ALTER TABLE, before the ALTER TABLE has had a chance to be replicated, will fail
  2. Make a "best guess" at whether the columns from the upstream and ReadySet prepare response are different (maybe as simple as just checking counts of columns), then if they differ convert the prepare request to upstream-only. This will fail some of the time (eg if a change has swapped the order of two columns of different types).
  3. Return the ReadySet prepare response, but do something to make the query failing due to the cache being dropped transparently re-migrate the query, either via something like ENG-2236 or something else in-line. This would still fail if the query failed to execute for any other reason, but that might be rare enough that we don't need to design around it
  4. ??? something else I'm not thinking of?

cc <accountid:62a73f87ddc560006e8a7baf> for product tradeoffs, <accountid:60ae9f5411a545006914db05> for extra eyes on this in case there’s something obvious I’m missing here.

nickelization commented 1 year ago

Also note, I maybe should’ve made it extra clear, this just seems to be a temporary race condition – after several seconds have passed, ReadySet no longer returns the bad responses.

nickelization commented 1 year ago

I’m not sure if this issue is worth addressing right now, so for the moment I’m just going to file this bug, and also create a CL with a unit test (marked #<ignore>) that reproduces the failure. Then I will temporarily disable the AddColumn operation in the DDL vertical tests (or otherwise work around this bug) and continue looking for more issues.