Columns in prepare responses and execute responses don't match when there's an ALTER TABLE that hasn't yet been replicated

nickelization commented 1 year ago

Found a failing test case via the DDL Vertical test suite, which appears to be caused by a bug in ReadySet:

thread 'run_cases' panicked at 'Test failed: index out of bounds: the len is 2 but the index is 2;

We then get a minimal failing case that looks like:

WriteRow(("_x__rC_oX1Duc", 0, <Text("4)<xDk$Cશ䦕Ⱥⴧ")>))
AddColumn(("_x__rC_oX1Duc", ColumnSpec { name: "___9vk_XU", sql_type: Real }))```

What appears to happen is that we run `ALTER TABLE ... ADD COLUMN ...` and then immediately run the postcondition where we check that the ReadySet table contents eventually converge and match what we see in Postgres. However, the Postgres client code panics while decoding the query results from ReadySet, so we never get to the point of comparing the table contents (and potentially retrying).

Since the client library itself was crashing, my suspicion was that we may be sending back invalid packets, and after digging into both the client code and the packet data, I was able to confirm this hunch. In this scenario, ReadySet first sends a `RowDescription` message that describes three columns (reflecting the result of having applied the `ALTER TABLE`) but then immediately follows it with a `DataRow` message that only contains two fields. This confuses the client code, which assumes the number of fields in `DataRow` will match up with the number of fields described in `RowDescription`, and results in the `index out of bounds` panic we see above.

_{From SyncLinear.com | REA-2216}

nickelization commented 1 year ago

I don’t think Gautam was suggesting we reorder the response packets, but rather to buffer the prepare response until the execute response has been generated, so that if they don’t match up we have a chance to fix things before we start to respond to the client.

That is accurate, but the suggestion does not work for the reasons that Nick and Girffin mention.

Given that, we might be able to get away with just returning an error to the client here.

Agreed. Its not perfect, but it might be okay.

nickelization commented 1 year ago

interestingly if when connected directly to postgres you:

Prepare SELECT * FROM t;
Run ALTER TABLE t ADD COLUMN y INT;
Execute the prepared statement

you get:

ERROR: cached plan must not change result type

Given that, we might be able to get away with just returning an error to the client here.

nickelization commented 1 year ago

{quote}That said, still not sure if that suggestion is feasible, since I would assume the client would not send the execute request to us at all until after we’ve sent the prepare response.{quote}

well, more importantly prepares and executes are not 1:1 - a single prepared statement can be executed any number of times, and the alter table can be replicated at any time

nickelization commented 1 year ago

An execute succeeding once gives no guarantee if it will succeed later (specifically, once the ALTER TABLE is in fact replicated, the view will get dropped and the execute will fail)

nickelization commented 1 year ago

Can we run our own auto-execute as a test after the prepare?

nickelization commented 1 year ago

I don’t think Gautam was suggesting we reorder the response packets, but rather to buffer the prepare response until the execute response has been generated, so that if they don’t match up we have a chance to fix things before we start to respond to the client.

That said, still not sure if that suggestion is feasible, since I would assume the client would not send the execute request to us at all until after we’ve sent the prepare response. But, then again, I didn’t think I noticed any client requests coming in between us sending the RowDescription and DataRow packets, so my current mental model feels a little fuzzy to me right now.

nickelization commented 1 year ago

We can’t change anything about the ordering of messages, since they’re specified by the client protocol used by the databases.

nickelization commented 1 year ago

<\~accountid:609236b4b9ac3a007151a40b> can we treat prepares and the executes as a transaction of sorts? So prepare response can only be sent after the execute is also completed?

nickelization commented 1 year ago

prepares can't be “redone” - we've already sent the response to the client

nickelization commented 1 year ago

After an ALTER table has been replicated to ReadySet, can we redo any prior prepare statments that we had done, and then of course, do the subsequent execution of the prepare result.

nickelization commented 1 year ago

The issue here is not that we’re returning a stale value, but that the actual schema of our results differs from the schema of the prepare response, which breaks all clients. That’d be the case, to varying degrees, in all of my proposed approaches

nickelization commented 1 year ago

So, in number 3, is that covered by our notion of eventual consistency? We would return a stale value, but that would be ok?

nickelization commented 1 year ago

Ok, here's what's going on here:

We run an ALTER TABLE to add a new column to a table
Before that ALTER TABLE is replicated to ReadySet, the client asks us to prepare a statement to SELECT * from that table
- To allow us to fall-back to upstream if executing against ReadySet fails, we prepare all statements on both upstream and ReadySet. In this case, this means that we get different result set columns for the two prepare results - the upstream one has the newly added column, but the ReadySet one doesn't yet.
- We then return the upstream version of the prepare result (I'm not sure why, but we have to pick something... and both are wrong for different reasons (see below))
When we actually execute the prepare result, we do it against ReadySet, meaning we return a different number of columns than the prepare result said we would
If instead we return the ReadySet prepare response, then eventually we will attempt to execute against ReadySet and fail because the cache has been dropped (as a result of the ALTER TABLE) - then, when we cascade to upstream, we'll return the wrong columns in the opposite direction!

It's not clear what the right thing to do is here - ideally, we'd like to be able to detect that the upstream and readyset prepare responses are different, and make that particular statement upstream only - but since we frequently return different column names and types than upstream (which is a large, but very difficult to solve issue) we can't actually reliably do that with total correctness. So we're left with a few options:

Do nothing. Queries made immediately after an ALTER TABLE, before the ALTER TABLE has had a chance to be replicated, will fail
Make a "best guess" at whether the columns from the upstream and ReadySet prepare response are different (maybe as simple as just checking counts of columns), then if they differ convert the prepare request to upstream-only. This will fail some of the time (eg if a change has swapped the order of two columns of different types).
Return the ReadySet prepare response, but do something to make the query failing due to the cache being dropped transparently re-migrate the query, either via something like ENG-2236 or something else in-line. This would still fail if the query failed to execute for any other reason, but that might be rare enough that we don't need to design around it
??? something else I'm not thinking of?

cc <_{accountid:62a73f87ddc560006e8a7baf> for product tradeoffs, <}accountid:60ae9f5411a545006914db05> for extra eyes on this in case there’s something obvious I’m missing here.

nickelization commented 1 year ago

Also note, I maybe should’ve made it extra clear, this just seems to be a temporary race condition – after several seconds have passed, ReadySet no longer returns the bad responses.

nickelization commented 1 year ago

I’m not sure if this issue is worth addressing right now, so for the moment I’m just going to file this bug, and also create a CL with a unit test (marked #<ignore>) that reproduces the failure. Then I will temporarily disable the AddColumn operation in the DDL vertical tests (or otherwise work around this bug) and continue looking for more issues.

readysettech / readyset

Columns in prepare responses and execute responses don't match when there's an ALTER TABLE that hasn't yet been replicated #100