timescale / timescaledb-backup

Other
33 stars 9 forks source link

Dumps periodically fail when there are a lot of or high-frequency continuous aggregate refreshes #38

Closed tylerfontaine closed 3 years ago

tylerfontaine commented 3 years ago

TSDB version: 2.x

Reproduction:

create table test (time timestamp, id int, val int);

select create_hypertable('test', 'time');

CREATE MATERIALIZED VIEW cagg1
WITH (timescaledb.continuous) AS
SELECT time_bucket(interval '1 hour', time) as bucket,
avg(val)
FROM test
group by 1
;

CREATE MATERIALIZED VIEW cagg2
WITH (timescaledb.continuous) AS
SELECT time_bucket(interval '1 hour', time) as bucket,
max(val)
FROM test
group by 1
;

CREATE MATERIALIZED VIEW cagg3
WITH (timescaledb.continuous) AS
SELECT time_bucket(interval '1 hour', time) as bucket,
min(val)
FROM test
group by 1
;

SELECT add_continuous_aggregate_policy('cagg1',
    start_offset => INTERVAL '2 years',
    end_offset => INTERVAL '1 minute',
    schedule_interval => INTERVAL '5 seconds');

SELECT add_continuous_aggregate_policy('cagg3',
    start_offset => INTERVAL '2 years',
    end_offset => INTERVAL '1 minute',
    schedule_interval => INTERVAL '5 seconds');

SELECT add_continuous_aggregate_policy('cagg3',
    start_offset => INTERVAL '2 years',
    end_offset => INTERVAL '1 minute',
    schedule_interval => INTERVAL '5 seconds');

Sample dataset: data.txt

It results in the following:

2021/03/10 16:36:11 Jobs:  have stopped, continuing
 pg_dump version: pg_dump (PostgreSQL) 12.4

2021/03/10 16:37:18 pg_dump: warning: there are circular foreign-key constraints on this table:
2021/03/10 16:37:18 pg_dump:   hypertable
2021/03/10 16:37:18 pg_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
2021/03/10 16:37:18 pg_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
2021/03/10 16:37:18 pg_dump: warning: there are circular foreign-key constraints on this table:
2021/03/10 16:37:18 pg_dump:   chunk
2021/03/10 16:37:18 pg_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
2021/03/10 16:37:18 pg_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
2021/03/10 16:38:00 pg_dump: error: could not obtain lock on relation "_timescaledb_catalog.continuous_aggs_invalidation_threshold"
2021/03/10 16:38:00 This usually means that someone requested an ACCESS EXCLUSIVE lock on the table after the pg_dump parent process had gotten the initial ACCESS SHARE lock on the table.
2021/03/10 16:38:00 pg_dump: error: a worker process died unexpectedly
2021/03/10 16:38:00 pg_dump run failed with: cmd.Run() failed with 'exit status 1'

So it seems like the jobmover should also stop cagg jobs, so we're able to dump this table. Or, if we don't actually need to dump this table, maybe we can skip it?

mccarthysean commented 3 years ago

I'm still having this issue with TimescaleDB version 1.6.0 with PostgreSQL version 11.6 on x86_64-pc-linux-musl, compiled by gcc (Alpine 8.3.0) 8.3.0, 64-bit.

$ ts-dump --db-URI postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/$POSTGRES_DATABASE --dump-dir /ts_dump

2021/06/15 20:21:48 pg_dump: NOTICE: there are circular foreign-key constraints on this table:
2021/06/15 20:21:48 pg_dump:   hypertable
2021/06/15 20:21:48 pg_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
2021/06/15 20:21:48 pg_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
2021/06/15 20:21:48 pg_dump: NOTICE: there are circular foreign-key constraints on this table:
2021/06/15 20:21:48 pg_dump:   chunk
2021/06/15 20:21:48 pg_dump: You might not be able to restore the dump without using --disable-triggers or temporarily dropping the constraints.
2021/06/15 20:21:48 pg_dump: Consider using a full dump instead of a --data-only dump to avoid this problem.
2021/06/15 20:39:38 pg_dump: [parallel archiver] could not obtain lock on relation "_timescaledb_catalog.continuous_aggs_invalidation_threshold"
2021/06/15 20:39:38 This usually means that someone requested an ACCESS EXCLUSIVE lock on the table after the pg_dump parent process had gotten the initial ACCESS SHARE lock on the table.
2021/06/15 20:39:38 pg_dump: [parallel archiver] a worker process died unexpectedly
2021/06/15 20:39:38 pg_dump run failed with: cmd.Run() failed with 'exit status 1'

Here's my Dockerfile showing how I install ts-dump. Is there something wrong with my installation? There's no way to see if I've got the latest ts-dump version... I'm not familiar with Go at all, but the following installs it fine, I think...

ARG VERSION

# pg_dump and pg_restore come from this base image,
# based on the TimescaleDB version one wants
FROM postgres:${VERSION}-alpine

# Copy the Golang binaries from this official image,
# rather than installing manually
COPY --from=golang:rc-alpine /usr/local/go/ /usr/local/go/

# Configure Go
ENV GOROOT /usr/local/go
ENV GOPATH /go
ENV PATH /usr/local/go/bin:$PATH

RUN mkdir -p ${GOPATH}/src ${GOPATH}/bin

# Download and build the ts-dump and ts-restore Golang packages
# RUN go get -u github.com/timescale/timescaledb-backup/
RUN go get -u github.com/timescale/timescaledb-backup/ || true && \
    # Build ts-dump first
    cd /go/pkg/mod/github.com/timescale/timescaledb-backup@v0.0.0-20210311165201-c4343c888b98/cmd/ts-dump && \
    go mod tidy && \
    go build -o /usr/local/go/bin/ts-dump && \
    # Build ts-restore second
    cd ../ts-restore && \
    go mod tidy && \
    go build -o /usr/local/go/bin/ts-restore
mccarthysean commented 3 years ago

FYI, I've updated my Dockerfile so that instead of trying to go get... then go build... ts-dump, I'm now downloading the Linux x86-64 binaries and renaming them, and this seems to have updated the ts-dump version (although I have no way of verifying that). I've since tried to create backups (including my TimescaleDB materialized views and continuous aggregates) three times, and it's worked all three times! So this is the installation routine I'll run now.

ARG VERSION
# pg_dump and pg_restore come from this base image,
# based on the TimescaleDB version one wants
FROM postgres:${VERSION}-alpine

# Copy the Golang binaries from this official image,
# rather than installing manually
COPY --from=golang:rc-alpine /usr/local/go/ /usr/local/go/

# Configure Go
ENV GOROOT /usr/local/go
ENV GOPATH /go
ENV PATH /usr/local/go/bin:$PATH

RUN mkdir -p ${GOPATH}/src ${GOPATH}/bin

# Download the Linux binaries manually
RUN cd /usr/local/go/bin && \
    wget https://github.com/timescale/timescaledb-backup/releases/download/0.1.1/ts-dump_0.1.1_Linux_x86_64 && \
    wget https://github.com/timescale/timescaledb-backup/releases/download/0.1.1/ts-restore_0.1.1_Linux_x86_64 && \
    # Check the checksums for the downloaded binaries
    wget https://github.com/timescale/timescaledb-backup/releases/download/0.1.1/checksums.txt && \
    cat checksums.txt && \
    sha256sum ts-dump_0.1.1_Linux_x86_64 && \
    sha256sum ts-restore_0.1.1_Linux_x86_64 && \
    # Rename the downloaded binaries to be the default binaries with generic names
    mv ts-dump_0.1.1_Linux_x86_64 ts-dump && \
    mv ts-restore_0.1.1_Linux_x86_64 ts-restore && \
    # Make the downloaded binaries executable
    chmod +x ts-dump ts-restore