influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.62k stars 3.54k forks source link

Backup error - unexpected EOF #23295

Open speedwheel opened 2 years ago

speedwheel commented 2 years ago

Steps to reproduce: List the minimal actions needed to reproduce the behavior.

1. $ influx backup   /backup/influxdb/backup_$(date '+%Y-%m-%d_%H-%M')   -t {token}
2.  receive an error after a while (Error: failed to backup bucket data: failed to download snapshot of shard 103: http: unexpected EOF reading trailer)

Expected behavior:

finish backup without errors

Actual behavior:

receiving an error before backup finishes

Environment info:

Logs:

Error: failed to backup bucket data: failed to download snapshot of shard 103: http: unexpected EOF reading trailer

Performance: Generate profiles with the following commands for bugs related to performance, locking, out of memory (OOM), etc.

# Commands should be run when the bug is actively happening.
# Note: This command will run for ~30 seconds.
curl -o profiles.tar.gz "http://localhost:8
086/debug/pprof/all?cpu=30s"
- curl: (52) Empty reply from server

iostat -xd 1 30 > iostat.txt
# Attach the `profiles.tar.gz` and `iostat.txt` output files.

iostat.txt

jscmidt commented 2 years ago

I have a similar error message.

Backup command: influx backup /tmp/backup/influxdb -t 'token'

Output:

2022/06/05 10:25:06 INFO: Downloading metadata snapshot
Error: failed to backup metadata: failed to save local copy of KV backup to "20220605T102506Z.bolt": unexpected EOF
tomklapka commented 2 years ago

Same issue when running influx backup --host "$INFLUXDB_HOST:$INFLUXDB_BACKUP_PORT" --org "$INFLUXDB_ORG" --token "$INFLUXDB_TOKEN" "$BACKUP_PATH": Influx CLI 2.3.0 (git: 88ba346) build_date: 2022-04-06T19:30:53Z InfluxDB v2.2.0 (git: a2f8538837) build_date: 2022-04-06T17:36:40Z Error: failed to backup bucket data: failed to download snapshot of shard 219: http: unexpected EOF reading trailer

jeffreyssmith2nd commented 2 years ago

Thanks for the reports. This looks like an error coming from the CLI where it is encountering some failure when downloading the shard from influxdb itself.

Is this something that is frequently reproducible or transient? If reproducible, can you provide logs from influxdb?

tomklapka commented 2 years ago

It is reproducible on every backup run, log file attached: influx.log

jscmidt commented 2 years ago

I could solve the problem for me. In the Logs InfluxDB always claimed that it doesn't have the permission to open the influxd.bold file. However, this file is owned by the user 1000 and shouldn't make any problems. The problem was that the InfluxDB volume was a cifs network mount. The cifs mount had the correct uid and gid (1000), but the network share host had a different owner of the share (that I can't change). I now solved it with just exposing the engine directory to the network share and keeping the influxd.bold file local on the docker host.

thanKx commented 1 year ago

Have you set the value of http-write-timeout in the influxdb configuration file, during the backup process, once this time is exceeded, an error will be reported

ivankudibal commented 5 months ago

when my db is so large, I would like to backup cold shards, and skip active shards