Open m-iancu opened 2 years ago
Although yb_pg_stat is still flaky, as the issue originally reports, yb_pg_stat_backend is even more flaky, and it appears to have been that way ever since it was introduced by commit e459e7e7217e03420ffa0976ad0b903e622b977d. The issue is ordering of output of pg_stat_activity between the Java test framework's default connection by user yugabyte_test
and the connection by the pg_regress
using user yugabyte
. Backend id determines the ordering. Somehow, yugabyte_test
could end up having a later backend id than yugabyte
. Is the creation of the default Java connection not strictly before the running of pg_regress
, or is the Java connection being reset at some point? It should be investigated how this happens.
It fails 2/27 times on alma8 ./yb_build.sh fastdebug --gcc11 --java-test TestPgRegressPgStat -n 1000
, recent master commit 6784abc5f274a713c7aa3f3162c426e7d7c4f306. I show part of the side-by-side diff for convenience:
9 SELECT datname, usename, state, query, backend_type, 9 SELECT datname, usename, state, query, backend_type,
10 catalog_version IS NOT null AS has_catalog_snapshot 10 catalog_version IS NOT null AS has_catalog_snapshot
11 FROM pg_stat_activity; 11 FROM pg_stat_activity;
12 datname | usename | state | query | backend_type | has_catalog_snapshot 12 datname | usename | state | query | backend_type | has_catalog_snapshot
13 ----------+---------------+--------+----------------------------------------------------------------------------+----------------+---------------------- 13 ----------+---------------+--------+----------------------------------------------------------------------------+----------------+----------------------
14 yugabyte | yugabyte_test | idle | SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ | client backend | f <
15 yugabyte | yugabyte | active | SELECT datname, usename, state, query, backend_type, +| client backend | t 14 yugabyte | yugabyte | active | SELECT datname, usename, state, query, backend_type, +| client backend | t
16 | | | catalog_version IS NOT null AS has_catalog_snapshot +| | 15 | | | catalog_version IS NOT null AS has_catalog_snapshot +| |
17 | | | FROM pg_stat_activity; | | 16 | | | FROM pg_stat_activity; | |
> 17 yugabyte | yugabyte_test | idle | SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ | client backend | f
18 | | | | checkpointer | f 18 | | | | checkpointer | f
19 (3 rows) 19 (3 rows)
20 20
21 -- Test yb_pg_stat_get_backend_catalog_version. 21 -- Test yb_pg_stat_get_backend_catalog_version.
22 SELECT beid, yb_pg_stat_get_backend_catalog_version(beid) IS NOT null 22 SELECT beid, yb_pg_stat_get_backend_catalog_version(beid) IS NOT null
23 AS has_catalog_snapshot 23 AS has_catalog_snapshot
24 FROM pg_stat_get_backend_idset() beid; 24 FROM pg_stat_get_backend_idset() beid;
25 beid | has_catalog_snapshot 25 beid | has_catalog_snapshot
26 ------+---------------------- 26 ------+----------------------
27 1 | f | 27 1 | t
28 2 | t | 28 2 | f
29 3 | f 29 3 | f
30 (3 rows) 30 (3 rows)
Jira Link: DB-3941
Description
Looks like failure rate increased recently: https://detective-gcp.dev.yugabyte.com/stability/test?branch=master&build_type=all&class=org.yb.pgsql.TestPgRegressPgStat&fail_tag=all&name=testPgStat&platform=all
From the trends it looks like this diff that modifies the test could be the reason: https://github.com/yugabyte/yugabyte-db/commit/f02b814589e4dff7754aeddfe6df895ff32806e8