Node goes down while CREATE INDEX is running, Index does not continue running after node comes back up

mlee-yb commented 3 years ago

Jira Link: DB-2551 Connect to database from node1. Create a table, load rows, CREATE INDEX on table. Take a node3 down using YW (Stop Processes). Node 1 and 3 are not Master Leader.

Before creating index, reduce the time of the --backfill_index_client_rpc_timeout_ms=3600000 to 2 mins (120000)

yugabyte=# CREATE INDEX testdata_amt ON backfilltest (amount DESC) SPLIT AT VALUES((30000),(28000),(26000),(24000),(22000),(20000),(18000),(16000),(14000),(12000),(10000),(8000),(6000),(4000));

Before killing node3, check Master Leader to see if backfill index is Running

Kill Node 3 using YW(Stop Processes)

create index on (on node 1) throws error: Already present (yb/consensus/retryable_requests.cc:245): Duplicate request

Checking Master Leader Tasks again, it still shows the Backfill Index Table - kRunning Checking Master Leader/Utilities/Server RPCs shows active RPC

Restart Node 3 - expectation that it will restart backfill. Master Leader shows that it still running state, but table is showing INVALID and size of table does not grow.

This is running on 2.5.3.1

jaki commented 3 years ago

None of this seems unexpected to me.

See https://docs.yugabyte.com/latest/api/ysql/the-sql-language/statements/ddl_create_index/#troubleshooting:

If online CREATE INDEX fails, it likely failed in the backfill step. In that case, the index exists but is not usable. Drop the index and try again.

See also issue #6218.

Responding to your comments:

Node 1 and 3 are not Master Leader.

And they never were master leader throughout the whole process here?

create index on (on node 1) throws error: Already present (yb/consensus/retryable_requests.cc:245): Duplicate request

Did you run the same exact query? How about running it with different index name?

Checking Master Leader Tasks again, it still shows the Backfill Index Table - kRunning

How long after killing node 3 is this?

Checking Master Leader/Utilities/Server RPCs shows active RPC

How long after killing node 3 is this?

Restart Node 3 - expectation that it will restart backfill.

How long after killing node 3 is this?

I don't think the expectation should be that it will restart backfill. @amitanandaiyer , do you know if tserver would resume a backfill or any operation without instructions from master?

Master Leader shows that it still running state, but table is showing INVALID and size of table does not grow.

What is "it"?

Table being INVALID is expected.

jaki commented 3 years ago

Before creating index, reduce the time of the --backfill_index_client_rpc_timeout_ms=3600000 to 2 mins (120000)

I forgot about this flag. ~I don't think it matters whether it's 60 minutes or 2 minutes since this is just what kicks off the backfill, which should take less than 5 seconds.~

~I thought about the whole thing some more, and I think issue #5326 is more relevant than issue #6218. If you wait out ysql_index_backfill_rpc_timeout_ms (default 30 minutes), master should realize that the RPCs to tserver failed and resend them. I assume you didn't wait out 30 minutes. I'll add that to the list of problems with having a large deadline.~

jaki commented 3 years ago

@mlee-yb, do you get lines like

W0405 17:24:03.961403 23690 async_rpc_tasks.cc:237] Backfilling index_ids { iii } for tablet 022fd3b971a04f37bbbe25d541711e50 from key '' (task=0x1f798d0, state=kRunning): TS 5aaec27c128a46948fa5be135b23b6c8: Backfill Index Table RPC failed for tablet 022fd3b971a04f37bbbe25d541711e50: Network error (yb/util/net/socket.cc:535): recvmsg got EOF from remote (system error 108)

in master leader logs? My testing shows that killing a node would cause backfill RPC connections to close soon after with the above message. Then, master should retry the RPC. For the first few retries, it will fail since the node is down.

If the node is not brought back up soon, the tablet leader should change so that the retry goes to a different tserver
If the node is brought back up soon, the retry should eventually succeed
If it did too many retries, the whole thing should fail

I assume you encountered the last case.

Checking Master Leader/Utilities/Server RPCs shows active RPC

Shows active RPC to where? The node that's killed?

mlee-yb commented 3 years ago

None of this seems unexpected to me.

See https://docs.yugabyte.com/latest/api/ysql/the-sql-language/statements/ddl_create_index/#troubleshooting:

If online CREATE INDEX fails, it likely failed in the backfill step. In that case, the index exists but is not usable. Drop the index and try again.

See also issue #6218.

Responding to your comments:

Node 1 and 3 are not Master Leader.

And they never were master leader throughout the whole process here? mlee>> Node 2 was the Master Leader and remained the Master Leader after I killed node 3.

create index on (on node 1) throws error: Already present (yb/consensus/retryable_requests.cc:245): Duplicate request

Did you run the same exact query? How about running it with different index name? mlee>> No, this error occurs right after I kill node 3. I was told that this error message is the result of "leader flop" which I was told when the pg backend sends a request and doesn't get a response. You kill the leader and that leader had already received that response. You then send the response to the new leader and the leader says I already have that request and says it's a duplicate request (instead of saying I've already done it).

Checking Master Leader Tasks again, it still shows the Backfill Index Table - kRunning

We did try to drop the index and found it cleaned itself up.

How long after killing node 3 is this? mlee>> we

Checking Master Leader/Utilities/Server RPCs shows active RPC

How long after killing node 3 is this?

Restart Node 3 - expectation that it will restart backfill.

How long after killing node 3 is this? mlee>> we waited for more than 2 minutes to exceed the backfill_index_client_rpc_timeout_ms threshold.

I don't think the expectation should be that it will restart backfill. @amitanandaiyer , do you know if tserver would resume a backfill or any operation without instructions from master?

Master Leader shows that it still running state, but table is showing INVALID and size of table does not grow.

What is "it"? mlee>> it =

Table being INVALID is expected.

jaki commented 3 years ago

I was told that this error message is the result of "leader flop" which I was told when the pg backend sends a request and doesn't get a response.

Which leader? Master leader? Tablet leader?

create index on (on node 1) throws error: Already present (yb/consensus/retryable_requests.cc:245): Duplicate request

The original create index stops with ERROR: Already present (yb/consensus/retryable_requests.cc:245): Duplicate request?

Master Leader shows that it still running state

Give it at most 10 minutes (index_backfill_rpc_max_delay_ms) for master to send another RPC to tserver, this time failing for good since the index tablets are deleted.

Without killing node 3, how long does backfill take?

yugabyte / yugabyte-db

Node goes down while CREATE INDEX is running, Index does not continue running after node comes back up #7923