timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.93k stars 882 forks source link

Add syscache tuple lock for pg17.1 and backports #7444

Closed erimatnor closed 4 days ago

erimatnor commented 1 week ago

PostgreSQL 17.1 introduced a fix for potential lost data when doing updates to pg_class and pg_database catalog tables. The reason the catalog update can be lost is because PG in some places performs non-MVCC compliant inplace updates of tuples in those catalogs, which requires extra locking to prevent overwrites. See the following commit for more information:

https://github.com/postgres/postgres/commit/3b7a689e1a805c4dac2f35ff14fd5c9fdbddf150

This fix was also backported to older PG versions 14.14, 15.9, 16.5, which TimescaleDB supports.

To be safe against inplace updates, the relation that is updated in pg_class via heap_update() needs to be locked with ShareUpdateExclusiveLock on the relation, or a ShareRowExclusiveLock or stricter on the relation. Otherwise, the update code needs to take a tuple-level lock.

TimescaleDB is affected by this change in the function that performs "quick migration" from a compressed chunk to a Hypercore TAM chunk. In that case, the regular ALTER TABLE ... SET ACCESS METHOD handling is bypassed and the pg_class catalog table is updated directly to avoid having to rewrite the compressed data. Since this migration doesn't take a heavy lock on the chunk, it needs the tuple-level lock in the update of pg_class.

Other places in our code that update pg_class tuples may be protected by locks on the relation and requires no extra tuple-level lock. A macro is added to check that appropriate level locks are held on the relation prior to the catalog changes.

Disable-check: force-changelog-file

codecov[bot] commented 1 week ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 82.13%. Comparing base (59f50f2) to head (6c13124). Report is 599 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #7444 +/- ## ========================================== + Coverage 80.06% 82.13% +2.07% ========================================== Files 190 229 +39 Lines 37181 42972 +5791 Branches 9450 10793 +1343 ========================================== + Hits 29770 35297 +5527 - Misses 2997 3385 +388 + Partials 4414 4290 -124 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.


🚨 Try these New Features:

fabriziomello commented 4 days ago

Looks like there are other places we should do the same fix:

erimatnor commented 4 days ago

Looks like there are other places we should do the same fix:

* https://github.com/timescale/timescaledb/blob/main/src/indexing.c#L456

* https://github.com/timescale/timescaledb/blob/main/src/utils.c#L1508

* https://github.com/timescale/timescaledb/blob/main/tsl/src/hypercore/hypercore_handler.c#L1882

The fix is only needed for updates to pg_class or pg_database. The scans in indexing.c I think touches pg_index so it should be ok. For the other places, we already hold sufficient locks on the table level. I added an assertion to check this based on similar assertion in PG. Note that PG17.1 also complains with a message if insufficient locks are held as long as it is compiled with assertions enabled.