Open tvesely opened 3 years ago
Seeing this in upgrade tests with LST on too, seems to be a duplicate of this:
2022-06-21 15:33:30,318 MainThread INFO --------------------------------------------------------------------------------
2022-06-21 15:33:30,318 MainThread INFO Running Long System Test 0.1
2022-06-21 15:33:30,318 MainThread INFO --------------------------------------------------------------------------------
2022-06-21 15:33:30,318 MainThread INFO
2022-06-21 15:33:30,354 MainThread INFO Reproduce with: git checkout 9a2992a1 && /var/lib/jenkins/code/yb-long-system-test/nohints/long_system_test.py --seed=0 --threads=4 --runtime=30 --nodes 10.9.133.26:5433,10.9.204.168:5433,10.9.219.0:5433 --upgrade
2022-06-21 15:33:30,642 MainThread INFO Database version: PostgreSQL 11.2-YB-2.12.5.0-b0 on x86_64-pc-linux-gnu, compiled by gcc (Homebrew gcc 5.5.0_4) 5.5.0, 64-bit
2022-06-21 15:33:30,645 MainThread INFO Creating tables for database db_lst_761509
2022-06-21 15:33:51,228 MainThread INFO Starting worker_0: SingleInsertAction, SingleUpdateAction, SingleDeleteAction, BulkInsertAction, BulkUpdateAction, SetConfigAction
2022-06-21 15:33:51,229 MainThread INFO Starting worker_1: RandomSelectAction, SetConfigAction
2022-06-21 15:33:51,229 MainThread INFO Starting worker_2: RandomSelectAction, SetConfigAction
2022-06-21 15:33:51,230 MainThread INFO Starting worker_3: RandomSelectAction, SetConfigAction
2022-06-21 15:34:01,240 MainThread INFO Worker queries/s: [035.5][015.6][017.9][015.7]
2022-06-21 15:34:11,250 MainThread INFO Worker queries/s: [028.1][013.9][014.2][014.1]
2022-06-21 15:34:21,260 MainThread INFO Worker queries/s: [012.9][004.8][005.0][004.3]
2022-06-21 15:34:31,269 MainThread INFO Worker queries/s: [030.6][013.3][015.8][013.5]
2022-06-21 15:34:41,279 MainThread INFO Worker queries/s: [022.7][009.1][010.4][009.4]
2022-06-21 15:34:51,285 MainThread INFO Worker queries/s: [035.8][016.1][018.6][016.5]
2022-06-21 15:35:01,295 MainThread INFO Worker queries/s: [033.3][014.7][016.8][015.9]
2022-06-21 15:35:11,305 MainThread INFO Worker queries/s: [027.9][015.9][014.8][016.0]
2022-06-21 15:35:21,315 MainThread INFO Worker queries/s: [027.5][016.9][014.8][016.8]
2022-06-21 15:35:31,323 MainThread INFO Worker queries/s: [029.5][013.5][013.9][016.4]
2022-06-21 15:35:41,333 MainThread INFO Worker queries/s: [029.8][014.4][014.3][014.7]
2022-06-21 15:35:51,343 MainThread INFO Worker queries/s: [030.7][015.9][014.8][016.3]
2022-06-21 15:36:01,355 MainThread INFO Worker queries/s: [027.0][015.7][013.8][014.3]
2022-06-21 15:36:11,365 MainThread INFO Worker queries/s: [030.6][015.6][013.2][015.5]
2022-06-21 15:36:21,374 MainThread INFO Worker queries/s: [035.4][016.6][016.3][017.7]
2022-06-21 15:36:31,384 MainThread INFO Worker queries/s: [035.9][017.0][019.4][017.6]
2022-06-21 15:36:41,393 MainThread INFO Worker queries/s: [035.9][015.2][016.1][017.5]
2022-06-21 15:36:51,403 MainThread INFO Worker queries/s: [033.8][015.1][017.6][017.3]
2022-06-21 15:37:01,413 MainThread INFO Worker queries/s: [035.6][015.5][018.7][017.2]
2022-06-21 15:37:11,423 MainThread INFO Worker queries/s: [028.1][014.8][012.8][013.7]
2022-06-21 15:37:21,433 MainThread INFO Worker queries/s: [028.5][014.3][013.1][014.7]
2022-06-21 15:37:31,444 MainThread INFO Worker queries/s: [025.6][014.1][013.1][014.6]
2022-06-21 15:37:41,452 MainThread INFO Worker queries/s: [027.7][013.1][011.7][014.0]
2022-06-21 15:37:51,463 MainThread INFO Worker queries/s: [026.2][013.9][012.8][014.3]
2022-06-21 15:38:01,472 MainThread INFO Worker queries/s: [025.9][013.3][011.7][013.2]
2022-06-21 15:38:11,482 MainThread INFO Worker queries/s: [024.9][012.5][012.5][013.3]
2022-06-21 15:38:21,492 MainThread INFO Worker queries/s: [026.8][012.8][011.3][014.0]
2022-06-21 15:38:31,502 MainThread INFO Worker queries/s: [024.6][013.5][011.9][012.5]
2022-06-21 15:38:41,511 MainThread INFO Worker queries/s: [024.5][012.6][011.4][013.0]
2022-06-21 15:38:51,521 MainThread INFO Worker queries/s: [024.7][012.1][011.7][014.8]
2022-06-21 15:39:01,536 MainThread INFO Worker queries/s: [023.8][012.9][011.0][013.5]
2022-06-21 15:39:11,546 MainThread INFO Worker queries/s: [022.1][012.8][010.0][012.4]
2022-06-21 15:39:21,556 MainThread INFO Worker queries/s: [023.7][013.0][012.0][012.4]
2022-06-21 15:39:31,562 MainThread INFO Worker queries/s: [024.8][013.5][011.2][013.4]
2022-06-21 15:39:41,572 MainThread INFO Worker queries/s: [026.3][014.3][012.8][014.1]
2022-06-21 15:39:51,582 MainThread INFO Worker queries/s: [024.7][013.4][011.8][012.8]
2022-06-21 15:40:01,592 MainThread INFO Worker queries/s: [023.5][012.5][011.8][013.3]
2022-06-21 15:40:11,604 MainThread INFO Worker queries/s: [021.2][011.5][009.9][012.4]
2022-06-21 15:40:21,614 MainThread INFO Worker queries/s: [022.6][011.8][010.6][012.5]
2022-06-21 15:40:31,625 MainThread INFO Worker queries/s: [023.4][012.3][011.8][012.3]
2022-06-21 15:40:41,635 MainThread INFO Worker queries/s: [019.3][010.8][010.5][012.3]
2022-06-21 15:40:51,648 MainThread INFO Worker queries/s: [022.9][012.3][010.0][012.2]
2022-06-21 15:41:01,658 MainThread INFO Worker queries/s: [021.1][011.4][011.3][011.9]
2022-06-21 15:41:11,669 MainThread INFO Worker queries/s: [021.9][012.6][011.8][012.6]
2022-06-21 15:41:21,674 MainThread INFO Worker queries/s: [022.5][012.8][011.6][012.3]
2022-06-21 15:41:31,684 MainThread INFO Worker queries/s: [023.9][012.8][012.2][011.8]
2022-06-21 15:41:41,697 MainThread INFO Worker queries/s: [027.4][012.2][013.5][001.6]
2022-06-21 15:41:51,706 MainThread INFO Worker queries/s: [028.8][013.7][013.4][011.8]
2022-06-21 15:42:01,716 MainThread INFO Worker queries/s: [027.3][013.4][014.5][013.0]
2022-06-21 15:42:11,726 MainThread INFO Worker queries/s: [027.8][013.4][014.5][014.6]
2022-06-21 15:42:21,736 MainThread INFO Worker queries/s: [027.7][012.7][013.5][013.2]
2022-06-21 15:42:31,746 MainThread INFO Worker queries/s: [025.0][013.0][013.4][012.0]
2022-06-21 15:42:41,756 MainThread INFO Worker queries/s: [023.7][011.9][012.2][012.0]
2022-06-21 15:42:51,766 MainThread INFO Worker queries/s: [026.6][013.5][014.4][013.7]
2022-06-21 15:43:01,777 MainThread INFO Worker queries/s: [019.5][010.5][009.7][010.2]
2022-06-21 15:43:11,787 MainThread INFO Worker queries/s: [012.9][006.0][008.2][000.5]
2022-06-21 15:43:21,797 MainThread INFO Worker queries/s: [020.0][013.2][013.3][000.3]
2022-06-21 15:43:31,807 MainThread INFO Worker queries/s: [025.4][014.4][013.6][011.6]
2022-06-21 15:43:41,817 MainThread INFO Worker queries/s: [024.0][012.2][012.8][012.3]
2022-06-21 15:43:51,827 MainThread INFO Worker queries/s: [010.7][004.6][006.3][004.9]
2022-06-21 15:44:01,837 MainThread INFO Worker queries/s: [013.6][004.8][007.0][004.9]
2022-06-21 15:44:11,847 MainThread INFO Worker queries/s: [022.9][010.9][012.7][009.6]
2022-06-21 15:44:21,856 MainThread INFO Worker queries/s: [025.1][013.3][013.9][012.6]
2022-06-21 15:44:31,866 MainThread INFO Worker queries/s: [018.9][006.8][007.4][005.6]
2022-06-21 15:44:41,879 MainThread INFO Worker queries/s: [023.2][011.4][012.2][010.3]
2022-06-21 15:44:51,888 MainThread INFO Worker queries/s: [017.2][006.8][007.5][008.1]
2022-06-21 15:45:01,900 MainThread INFO Worker queries/s: [016.2][007.1][009.3][007.3]
2022-06-21 15:45:11,910 MainThread INFO Worker queries/s: [018.9][009.3][010.8][008.0]
2022-06-21 15:45:21,920 MainThread INFO Worker queries/s: [019.8][010.6][010.3][010.7]
2022-06-21 15:45:31,932 MainThread INFO Worker queries/s: [023.3][011.6][012.9][011.4]
2022-06-21 15:45:41,942 MainThread INFO Worker queries/s: [015.8][007.6][008.5][007.7]
2022-06-21 15:45:51,951 MainThread INFO Worker queries/s: [018.0][007.8][009.3][008.1]
2022-06-21 15:46:01,962 MainThread INFO Worker queries/s: [024.5][010.8][012.2][010.9]
2022-06-21 15:46:11,971 MainThread INFO Worker queries/s: [014.9][007.4][007.7][009.3]
2022-06-21 15:46:21,981 MainThread INFO Worker queries/s: [023.6][011.9][011.9][011.0]
2022-06-21 15:46:31,990 MainThread INFO Worker queries/s: [028.2][012.5][013.6][012.3]
2022-06-21 15:46:42,001 MainThread INFO Worker queries/s: [027.4][011.9][012.6][012.8]
2022-06-21 15:46:52,010 MainThread INFO Worker queries/s: [024.3][012.5][011.8][008.5]
2022-06-21 15:47:02,020 MainThread INFO Worker queries/s: [024.2][012.1][010.7][000.0]
2022-06-21 15:47:12,030 MainThread INFO Worker queries/s: [022.5][011.4][009.9][000.0]
2022-06-21 15:47:22,040 MainThread INFO Worker queries/s: [021.7][010.1][009.7][009.4]
2022-06-21 15:47:32,050 MainThread INFO Worker queries/s: [024.3][012.0][011.2][009.3]
2022-06-21 15:47:42,060 MainThread INFO Worker queries/s: [027.4][013.7][012.3][013.1]
2022-06-21 15:47:52,070 MainThread INFO Worker queries/s: [024.8][011.2][012.1][010.9]
2022-06-21 15:48:02,080 MainThread INFO Worker queries/s: [025.4][011.5][011.5][013.3]
2022-06-21 15:48:12,090 MainThread INFO Worker queries/s: [023.2][011.3][012.1][010.6]
2022-06-21 15:48:22,100 MainThread INFO Worker queries/s: [025.2][012.9][012.2][012.3]
2022-06-21 15:48:32,110 MainThread INFO Worker queries/s: [025.3][012.7][012.1][012.1]
2022-06-21 15:48:42,120 MainThread INFO Worker queries/s: [024.4][011.5][010.7][011.1]
2022-06-21 15:48:52,126 MainThread INFO Worker queries/s: [021.0][010.4][009.4][008.0]
2022-06-21 15:49:02,136 MainThread INFO Worker queries/s: [019.9][010.8][011.8][009.8]
2022-06-21 15:49:12,147 MainThread INFO Worker queries/s: [024.3][010.5][011.1][010.8]
2022-06-21 15:49:22,157 MainThread INFO Worker queries/s: [012.8][005.4][007.6][007.3]
2022-06-21 15:49:32,167 MainThread INFO Worker queries/s: [016.3][008.2][009.0][008.3]
2022-06-21 15:49:42,177 MainThread INFO Worker queries/s: [017.3][008.3][009.6][008.6]
2022-06-21 15:49:52,187 MainThread INFO Worker queries/s: [020.0][009.3][010.7][009.7]
2022-06-21 15:50:02,197 MainThread INFO Worker queries/s: [017.1][009.6][010.6][010.4]
2022-06-21 15:50:12,207 MainThread INFO Worker queries/s: [016.4][009.5][009.0][007.8]
2022-06-21 15:50:22,216 MainThread INFO Worker queries/s: [019.3][008.5][009.7][008.5]
2022-06-21 15:50:32,226 MainThread INFO Worker queries/s: [020.9][010.3][010.6][010.2]
2022-06-21 15:50:42,236 MainThread INFO Worker queries/s: [018.4][010.0][009.4][008.5]
2022-06-21 15:50:52,245 MainThread INFO Worker queries/s: [018.2][009.0][008.9][009.2]
2022-06-21 15:51:02,255 MainThread INFO Worker queries/s: [017.5][006.7][008.5][007.4]
2022-06-21 15:51:12,264 MainThread INFO Worker queries/s: [021.1][008.8][010.3][008.1]
2022-06-21 15:51:22,274 MainThread INFO Worker queries/s: [021.7][009.7][010.6][007.5]
2022-06-21 15:51:32,284 MainThread INFO Worker queries/s: [020.9][010.2][010.6][010.2]
2022-06-21 15:51:42,294 MainThread INFO Worker queries/s: [022.7][011.7][010.8][010.5]
2022-06-21 15:51:52,300 MainThread INFO Worker queries/s: [018.9][010.8][010.2][009.7]
2022-06-21 15:52:02,309 MainThread INFO Worker queries/s: [017.5][009.0][009.9][009.0]
2022-06-21 15:52:12,319 MainThread INFO Worker queries/s: [020.0][011.2][012.1][009.4]
2022-06-21 15:52:22,329 MainThread INFO Worker queries/s: [017.8][010.6][010.9][008.6]
2022-06-21 15:52:32,338 MainThread INFO Worker queries/s: [020.9][009.5][009.3][008.7]
2022-06-21 15:52:42,348 MainThread INFO Worker queries/s: [017.3][006.7][009.0][008.0]
2022-06-21 15:52:52,358 MainThread INFO Worker queries/s: [021.9][009.7][009.9][010.4]
2022-06-21 15:53:02,369 MainThread INFO Worker queries/s: [019.5][006.1][007.8][007.8]
2022-06-21 15:53:12,379 MainThread INFO Worker queries/s: [017.9][008.1][008.4][009.0]
2022-06-21 15:53:22,389 MainThread INFO Worker queries/s: [019.1][009.2][008.4][009.7]
2022-06-21 15:53:32,398 MainThread INFO Worker queries/s: [013.4][006.1][007.6][008.2]
2022-06-21 15:53:42,406 MainThread INFO Worker queries/s: [018.7][008.9][009.1][010.0]
2022-06-21 15:53:52,416 MainThread INFO Worker queries/s: [018.3][008.8][008.4][007.9]
2022-06-21 15:54:02,426 MainThread INFO Worker queries/s: [012.6][010.1][009.4][007.5]
2022-06-21 15:54:12,435 MainThread INFO Worker queries/s: [018.4][009.8][009.3][009.2]
2022-06-21 15:54:22,445 MainThread INFO Worker queries/s: [017.3][009.7][009.8][008.7]
2022-06-21 15:54:32,455 MainThread INFO Worker queries/s: [020.9][009.9][009.4][008.5]
2022-06-21 15:54:42,465 MainThread INFO Worker queries/s: [017.1][009.2][007.8][008.7]
2022-06-21 15:54:52,475 MainThread INFO Worker queries/s: [021.1][011.0][009.8][011.2]
2022-06-21 15:55:02,482 MainThread INFO Worker queries/s: [015.2][006.5][007.2][007.6]
2022-06-21 15:55:12,491 MainThread INFO Worker queries/s: [010.9][007.7][000.0][009.7]
2022-06-21 15:55:22,501 MainThread INFO Worker queries/s: [021.6][010.0][000.0][010.2]
2022-06-21 15:55:30,478 worker_2 ERROR Exception while reconnecting to YugabyteDB
2022-06-21 15:55:30,478 worker_2 ERROR could not connect to server: Connection refused
Is the server running on host "10.9.219.0" and accepting
TCP/IP connections on port 5433?
2022-06-21 15:55:30,495 worker_2 ERROR Exception while reconnecting to YugabyteDB
2022-06-21 15:55:30,495 worker_2 ERROR could not connect to server: Connection refused
Is the server running on host "10.9.219.0" and accepting
TCP/IP connections on port 5433?
2022-06-21 15:55:30,512 worker_2 ERROR Exception while reconnecting to YugabyteDB
2022-06-21 15:55:30,512 worker_2 ERROR could not connect to server: Connection refused
Is the server running on host "10.9.219.0" and accepting
TCP/IP connections on port 5433?
2022-06-21 15:55:30,531 worker_2 ERROR Exception while reconnecting to YugabyteDB
2022-06-21 15:55:30,531 worker_2 ERROR could not connect to server: Connection refused
Is the server running on host "10.9.219.0" and accepting
TCP/IP connections on port 5433?
2022-06-21 15:55:30,548 worker_2 ERROR Exception while reconnecting to YugabyteDB
2022-06-21 15:55:30,549 worker_2 ERROR could not connect to server: Connection refused
Is the server running on host "10.9.219.0" and accepting
TCP/IP connections on port 5433?
2022-06-21 15:55:30,566 worker_2 ERROR Exception while reconnecting to YugabyteDB
2022-06-21 15:55:30,566 worker_2 ERROR could not connect to server: Connection refused
Is the server running on host "10.9.219.0" and accepting
TCP/IP connections on port 5433?
2022-06-21 15:55:30,585 worker_2 ERROR Exception while reconnecting to YugabyteDB
2022-06-21 15:55:30,585 worker_2 ERROR could not connect to server: Connection refused
Is the server running on host "10.9.219.0" and accepting
TCP/IP connections on port 5433?
2022-06-21 15:55:30,602 worker_2 ERROR Exception while reconnecting to YugabyteDB
2022-06-21 15:55:30,602 worker_2 ERROR could not connect to server: Connection refused
Is the server running on host "10.9.219.0" and accepting
TCP/IP connections on port 5433?
2022-06-21 15:55:30,619 worker_2 ERROR Exception while reconnecting to YugabyteDB
2022-06-21 15:55:30,619 worker_2 ERROR could not connect to server: Connection refused
Is the server running on host "10.9.219.0" and accepting
TCP/IP connections on port 5433?
2022-06-21 15:55:30,636 worker_2 ERROR Exception while reconnecting to YugabyteDB
2022-06-21 15:55:30,636 worker_2 ERROR could not connect to server: Connection refused
Is the server running on host "10.9.219.0" and accepting
TCP/IP connections on port 5433?
2022-06-21 15:55:30,636 worker_2 ERROR Giving up
Exception in thread worker_2:
Traceback (most recent call last):
File "/usr/share/python-3.7.7/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/usr/share/python-3.7.7/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/var/lib/jenkins/code/yb-long-system-test/nohints/lst/worker.py", line 49, in run
with self.conn.cursor() as cur:
psycopg2.InterfaceError: connection already closed
2022-06-21 15:55:32,511 MainThread INFO Worker queries/s: [022.1][009.9][000.0][010.4]
2022-06-21 15:55:42,521 MainThread INFO Worker queries/s: [021.9][009.6][000.0][010.8]
2022-06-21 15:55:52,531 MainThread INFO Worker queries/s: [020.4][010.5][000.0][010.6]
2022-06-21 15:56:02,541 MainThread INFO Worker queries/s: [014.7][007.8][000.0][007.4]
2022-06-21 15:56:09,638 worker_0 ERROR Unexpected query failure: InternalError_
Query: UPDATE t0 SET c0_int4range = int4range((octet_length('66')), NULL) WHERE (64.63889331793206) != (-16.267694121145368);
values: None
runtime: 2022-06-21 15:56:03.092 - 2022-06-21 15:56:09.637
supports explain: True
supports rollback: True
affected rows: None
Action: SingleUpdateAction
Error class: InternalError_
Error code: XX000
Error message: ERROR: Already present: [Already present (yb/consensus/retryable_requests.cc:244): Duplicate request 341 from client 95e6fe92-ce24-4b86-9f23-44da1c3c1e42 (min running 341)]
Transaction isolation level: default
DB Node: host: 10.9.133.26, port: 5433
DB Backend PID: 17655
Duplicate retryable requests were resolved by fix for #15006
Reopen issue since it's happening on some overloaded clusters.
Jira Link: DB-1635 I've seen this error several times now when loading data:
According to my analysis, I believe that COPY is needlessly failing. This is the sequence of events that leads to the error:
COPY
command is started by the client.yb.tserver.TabletServerService/Write
RPC request to a tablet for a batch of tuplesyb.tserver.TabletServerService/Write
operation is flushed to the tablet followers in ayb.consensus.ConsensusService/UpdateConsensus
requestyb.tserver.TabletServerService/Write
operation, the tablet switches leadersyb.tserver.TabletServerService/Write
had succeeded, the client attempted the write again, and the tserver responded withERROR: Already present: [Already present (yb/consensus/retryable_requests.cc:241): Duplicate request]
The query fails
I managed to reproduce the issue on a test cluster, and have pieced together the events from a wire capture:
I killed the tserver on
yb-demo-tvesely-copy-566-n3
(10.142.15.196
) and the errorERROR: Already present: [Already present (yb/consensus/retryable_requests.cc:241): Duplicate request]
happened immediately. From the Postgres logs:This appears to be the original write request that went to the tablet on
a09121672a2d4f1ca3ab05e07481faf3
on node10.142.15.196
before the tserver on that node went down:The
yb.tserver.TabletServerService/Write
RPC call never recieved a response before the node went down, however it WAS replicated over to its RAFT peers:And the response:
This is the
Duplicate request
message, which caused theCOPY
to terminate. It was sent to the new leader for tableta09121672a2d4f1ca3ab05e07481faf3
, and because the original tablet LEADER had already replicated these tuples, this tablet already had a record of it:The tablet responds with
ALREADY_PRESENT
, because this tuple had already been inserted into this tablet by the original leader:Based on the above evidence, it's my conclusion that the error above need not cancel the running transaction. Is there any reason that the
COPY
should not be allowed to continue?