[YSQL] Avoid read restart errors in long-running, non transactional operation such as ANALYZE

pao214 commented 6 months ago

Description

Decided to

Avoid read restarts by picking a time on pg layer without any uncertainty interval.
Use a fresh read time for each table in a multi-table analyze. This does not prevent snapshot too old errors but at least helps use cases where there are too many tables and the user is running ANALYZE at once.

Context

ANALYZE collects statistics from every table. Source: https://www.postgresql.org/docs/current/sql-analyze.html.

Computing stats from every table takes a long time. In the mean time, background processes can remove older snapshots in YB, see timestamp_history_retention_interval_sec.

On the other hand, ANALYZE does not strictly require a consistent snapshot of the database since it is collecting query planner stats.

Develop a mechanism to avoid too snapshot old errors with the tradeoff of a non transactional read. Caveat: Do updates to pg_stat_statements and reltuples need to be transactional?

Moreover, ANALYZE does not require a recent snapshot. Therefore, raising a read restart error in the middle of a long running operation such as analyze will lose valuable progress.

Proposal

The query layer reads from the storage layer in the granularity of pages (usually 1024 rows at a time). We plan to advance the read point for each such fetch request.

Optional Optimization

Use the same read point as long as we possibly can, i.e. retry the read RPC with the latest read point only after receiving a snapshot too old error.

Alternate Proposals

Use one read point for each table Counterpoint: ANALYZE on large tables can run for a long time.
Retry snapshot too old error internally. Counterpoint: We are likely to hit this error again after the retry.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

[X] I confirm this issue does not contain any sensitive information.

archit-rastogi commented 2 months ago

Data points: http://stress.dev.yugabyte.com/stress_test/cc321053-dee5-4fa6-9b0a-d2fc22599dba

09:17:27.914 [Thread-5] INFO  sql: <conn:new> execute EXPLAIN ANALYZE select
    supp_nation,
    cust_nation,
    l_year,
    sum(volume) as revenue
from
    (
        select
            n1.n_name as supp_nation,
            n2.n_name as cust_nation,
            extract(year from l_shipdate) as l_year,
            l_extendedprice * (1 - l_discount) as volume
        from
            supplier,
            lineitem,
            orders,
            customer,
            nation n1,
            nation n2
        where
            s_suppkey = l_suppkey
            and o_orderkey = l_orderkey
            and c_custkey = o_custkey
            and s_nationkey = n1.n_nationkey
            and c_nationkey = n2.n_nationkey
            and (
                (n1.n_name = 'UNITED STATES' and n2.n_name = 'IRAQ')
                or (n1.n_name = 'IRAQ' and n2.n_name = 'UNITED STATES')
            )
            and l_shipdate between date '1995-01-01' and date '1996-12-31'
    ) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year;
09:35:13.360 [Thread-5] INFO  Query took 1065 seconds
09:35:13.361 [Thread-5] DEBUG sql: close connection 172.151.17.250
java.lang.RuntimeException: com.yugabyte.util.PSQLException: ERROR: Snapshot too old. Read point: { physical: 1723540648019676 }, earliest read time allowed: { physical: 1723540813015946 }, delta (usec): 164996270: kSnapshotTooOld
  Where: parallel worker
    at com.yugabyte.clientserver.YugabyteDbProxy.executeYSql(YugabyteDbProxy.java:284)
    at com.yugabyte.clientserver.YugabyteDbProxy.executeYSql(YugabyteDbProxy.java:259)
    at com.yugabyte.clientserver.YugabyteDbProxy.execute(YugabyteDbProxy.java:318)
    at com.yugabyte.clientserver.QueryHandler.runQuery(QueryHandler.java:167)
    at com.yugabyte.clientserver.QueryHandler.lambda$runQueryHandler$2(QueryHandler.java:196)
    at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: com.yugabyte.util.PSQLException: ERROR: Snapshot too old. Read point: { physical: 1723540648019676 }, earliest read time allowed: { physical: 1723540813015946 }, delta (usec): 164996270: kSnapshotTooOld
  Where: parallel worker
    at com.yugabyte.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675)
    at com.yugabyte.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365)
    at com.yugabyte.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:355)
    at com.yugabyte.jdbc.PgStatement.executeInternal(PgStatement.java:490)
    at com.yugabyte.jdbc.PgStatement.execute(PgStatement.java:408)
    at com.yugabyte.jdbc.PgStatement.executeWithFlags(PgStatement.java:329)
    at com.yugabyte.jdbc.PgStatement.executeCachedSql(PgStatement.java:315)
    at com.yugabyte.jdbc.PgStatement.executeWithFlags(PgStatement.java:291)
    at com.yugabyte.jdbc.PgStatement.execute(PgStatement.java:286)
    at com.yugabyte.clientserver.YugabyteDbProxy.executeYSql(YugabyteDbProxy.java:279)
    ... 5 more

pao214 commented 1 month ago

Tackling snapshot too old error is challenging in the current design. That said, such errors should go away after GH issue #22200 lands.

pao214 commented 1 month ago

Landed on master, 2024.1

yugabyte / yugabyte-db