yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.99k stars 1.07k forks source link

[DocDB] External mini cluster doesn't set RF #13922

Open jasonyb opened 2 years ago

jasonyb commented 2 years ago

Jira Link: DB-3418

Description

I found external mini cluster does not properly set replication_factor. There's old code

FLAGS_replication_factor = narrow_cast<int>(opts_.num_masters);

but that gflag only affects the test process, not the daemon processes. The daemons get default rf 3 even when there's 1 master. This can cause any newly written tests using one master and tserver to fail with

ERROR:  Not found: Table system.transactions not found: OBJECT_NOT_FOUND

when trying to do a postgres query (maybe generally any transactional query). This is because catalog manager relies on RF to determine how many tservers to wait for before creating the transaction status tablet:

YSQL is enabled, will create the transaction status table when 3 tablet servers are online

There are some existing tests that have 1 master but have been running with the unexpected rf 3 the whole time. For example, PgWrapperTestBase::GetNumMasters returns 1, and this is the base test for most PG C++ tests. Fixing the code to set replication_factor flag on the master based on num_masters can change the replication factor on a lot of tests. If the change appears to have any bad side effect, that test can have a replication factor explicitly set.

xiaoliwe commented 2 months ago

Has this problem been resolved?

jasonyb commented 2 months ago

Has this problem been resolved?

I see the code has been touched since then by cb74430b79d9493fc3d23b1496cd4d7f0c4c41f1 but the issue remains. In case it wasn't clear, this is a test-only issue.