influxdata / influxdata-docker

Official docker images for the influxdata stack
327 stars 248 forks source link

Influxdb Backup is not working properly in influxdb OSS V2.1.1 #604

Open shuoChenTHU opened 2 years ago

shuoChenTHU commented 2 years ago

About the bug

Steps to reproduce: List the minimal actions needed to reproduce the behavior.

  1. just perform the backup cli command influx backup -t {$rootToken} /var/backups/backup_$(date '+%Y-%m-%d_%H-%M')
  2. then check the backup log and files, data are incomplete

Expected behavior:

influxdb take a full backup of all the data stored in it, in my case it should be around 1.5-2.0 GB

Actual behavior: data in the backup files are only 1.3 Mb, WARN: Shard xxxx removed during backup shown frequently in the log

Visual Proof:

2 1

data in influxdb seem to be ok, just the backup is buggy. A downgrade back to v2.0 caused errors during the initialization of the container application.

About your environment

Environment info:

influxdb OSS v2.1.1, installation by container

Linux 5.10.60-qnap x86_64 InfluxDB 2.1.1 (git: 657e1839de) build_date: 2021-11-09T03:03:48Z

Config:

version: '3.5'
networks:
  influxdbnw:
    name: influxdb_network
services:  
  influxdb:
    image: 'influxdb:2.1.1'
    restart: always
    container_name: influxdb
    hostname: 'influxdb'
    volumes:
      #- ./data:/root/.influxdbv2
      - .\conf\influx\data:/var/lib/influxdb2
      - .\conf\influx\config:/etc/influxdb2
      - .\conf\influx\scripts:/docker-entrypoint-initdb.d
      - '{§backup path}:/var/backups'
      - ./config/influx-configs:/etc/influxdb2/influx-configs'
    ports:
      - '8086:8086'
    networks:
      - influxdbnw
    environment:
      - DOCKER_INFLUXDB_INIT_MODE=setup
      - DOCKER_INFLUXDB_INIT_USERNAME=admin 
      - DOCKER_INFLUXDB_INIT_PASSWORD=password
      - DOCKER_INFLUXDB_INIT_ORG=org
      - DOCKER_INFLUXDB_INIT_BUCKET=initBucket  
      - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=token
      - TZ=Europe/Berlin
    restart: always  
volumes:
  data:
  config:
  scripts:

Logs:

see the screenshots above

Performance: Generate profiles with the following commands for bugs related to performance, locking, out of memory (OOM), etc.

# Commands should be run when the bug is actively.
# Note: This command will run for at least 30 seconds.
curl -o profiles.tar.gz "http://localhost:8086/debug/pprof/all?cpu=true"
curl -o vars.txt "http://localhost:8086/debug/vars"
iostat -xd 1 30 > iostat.txt
# Attach the `profiles.tar.gz`, `vars.txt`, and `iostat.txt` output files.
SokoFromNZ commented 2 years ago

Any news on this? I have the exact same problem running the command directly on the Debian machine!

shuoChenTHU commented 2 years ago

Any news on this? I have the exact same problem running the command directly on the Debian machine!

Hi Soko, I haven't heard anything regarding this topic. It seems that the backup process excludes cold shards, which are beyond a certain retention period. They are not deleted, but just won't be considered. My solution is to backup the entire HDD, if something went wrong, I could always roll back to the last healthy state. In case only the most recent data in influx are to be recovered, the regular influxdb backup is still useful.

shuoChenTHU commented 2 years ago

I just performed another backup, miracally I saw no log saying that one shard has been removed from the backup, but I can not tell whether everthing is in the backup. Perhaps you can wait for another minor version update and the issue will also automatically be fixed.

SokoFromNZ commented 2 years ago

Yeah... thankfully my installation is in a Debian VM and so I can backup this completetely as well. Although I have to shut it down and therefore loose some sensor data during the time it is off. So a working backup is still awefully needed.

Ohh, about retention points: I have not a single bucket with a retention set (everything is "Forever") and I still get the a lot of the warnings...

shuoChenTHU commented 2 years ago

Yeah... thankfully my installation is in a Debian VM and so I can backup this completetely as well. Although I have to shut it down and therefore loose some sensor data during the time it is off. So a working backup is still awefully needed.

Ohh, about retention points: I have not a single bucket with a retention set (everything is "Forever") and I still get the a lot of the warnings...

speaking of retention periods, as far as I know, all "forever" only ensures that your data in the buckets will never be deleted automatically. However, influxdb does has some other implementations regarding shard storage and compaction, if one bucket hasn't recieved any new writes in 4h (by default), the shards associated to it could be labeled as "cold shard", or trigger a shard compaction. I don't really know much about this configuration, you can check it here by yourself: https://docs.influxdata.com/influxdb/v2.5/reference/internals/shards/

SokoFromNZ commented 2 years ago

Thanks for the valuable info and link. I did not look into shards, cold-shards etc. yet. I might try to backup the data (which will give warnings) and restore it in a new installation if influx to see which data will be restored.

I definitely have buckets with no new data written to since days!

In the end though: If I backup a database I want everything in there. Doesn't matter how old the data/bucket is! Or at least an option in backup to force a complete backup.