yugabyte / tpcc

Repo to run TPCC benchmarks against YugabyteDB
Other
24 stars 21 forks source link

tpccbenchmark crashes when logging is turned on and a node failure occures #139

Open droberts-yb opened 1 year ago

droberts-yb commented 1 year ago

When running tpccbenchmark with log4j enabled, the process crashes with. FATAL error, when a node failure occurs: When logging is not enabled, tpccbenchmark continues running when a node failure occurs.

The build is the latest main branch from the git repo (as of 6th April 2023).

Error Message

Apr 06, 2023 6:37:03 PM com.yugabyte.Driver getConnectionBalanced
WARNING: got exception FATAL: the database system is shutting down, while connecting to 172.159.41.63
18:37:03,409 (Worker.java:563) DEBUG - Worker<414> Payment/02 Result: SUCCESS
java.lang.RuntimeException: Unexpected fatal, error in 'Worker<116>' when executing 'Payment/02'
    at com.oltpbenchmark.api.Worker.doWork(Worker.java:587)
    at com.oltpbenchmark.api.Worker.run(Worker.java:368)
    at java.lang.Thread.run(Thread.java:750)
Caused by: com.yugabyte.util.PSQLException: FATAL: terminating connection due to administrator command
    at com.yugabyte.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675)
    at com.yugabyte.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365)
    at com.yugabyte.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355)
    at com.yugabyte.jdbc.PgStatement.executeInternal(PgStatement.java:490)
    at com.yugabyte.jdbc.PgStatement.execute(PgStatement.java:408)
    at com.yugabyte.jdbc.PgStatement.executeWithFlags(PgStatement.java:329)
    at com.yugabyte.jdbc.PgStatement.executeCachedSql(PgStatement.java:315)
    at com.yugabyte.jdbc.PgStatement.executeWithFlags(PgStatement.java:291)
    at com.yugabyte.jdbc.PgConnection.execSQLQuery(PgConnection.java:496)
    at com.yugabyte.jdbc.PgConnection.execSQLQuery(PgConnection.java:489)
    at com.yugabyte.jdbc.PgConnection.getTransactionIsolation(PgConnection.java:941)
    at com.zaxxer.hikari.pool.HikariProxyConnection.getTransactionIsolation(HikariProxyConnection.java)
    at com.oltpbenchmark.api.Worker.doWork(Worker.java:490)
    ... 2 more

Log Configuration The only line I changed in the log4j.properties file is this one: log4j.logger.com.oltpbenchmark.api=DEBUG

Environment 3 node Universe with RF3 in AWS. Each node is 8vcpu

Steps To Reproduce

  1. Edit the log4j.properties as above.
  2. Run the standard create, load steps of the tpccbenchmark tool to build a 100 warehouse tpcc test database.
  3. Start the execute phase of the tpccbenchmark tool and check its generating the log. It should include lines like:
    
    17:14:04,832 (Worker.java:563) DEBUG - Worker<975> Delivery/04 Result: SUCCESS
    17:14:04,857 (Worker.java:563) DEBUG - Worker<935> NewOrder/01 Result: SUCCESS
    17:14:04,875 (Worker.java:563) DEBUG - Worker<506> Payment/02 Result: SUCCESS

4. Stop a node via the AWS admin console
The error occurs the instant the node is shutdown.

5. Restart the node that was shutdown and let it rejoin the cluster
6. Restart the execute phase of the tpccbenchmark tool and check its producing the logs as above
7. Stop a node via the AWS admin console
The error occurs the instant the node is shutdown.

**Background**
The log output as per above is required to feed into a utility that visualises what tpcc transaction type each worker is doing is in real-time.
The purpose is to visually show that the database continues running when a node fails and clients stay running.