influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
28.5k stars 3.53k forks source link

influxdb 1.8.9 fails to load via systemctl script on existing DB #22297

Open naorw opened 2 years ago

naorw commented 2 years ago

Steps to reproduce: List the minimal actions needed to reproduce the behavior.

  1. stop the DB with systemctl stop influxdb
  2. start the DB with systemctl start influxdb

Expected behavior: Database starting

Actual behavior: Database fails to start with "Failed to reach influxdb http endpoint at http://localhost:8086/health" error running "/usr/lib/influxdb/scripts/influxd-systemd-start.sh" script manually fails with same error. however database starts and connectable via http. curl http://localhost:8086/health {"checks":[],"message":"ready for queries and writes","name":"influxdb","status":"pass","version":"1.8.9"} possibly due to loading shards time.

Environment info:

Logs: Include snippet of errors in log. nfluxd-systemd-start.sh[17616]: Failed to reach influxdb http endpoint at http://localhost:8086/health systemd[1]: influxdb.service: Control process exited, code=exited status=1

amfasis commented 2 years ago

Same here on RaspberryPi 2B+, Linux 4.19.66+ armv6l, Database folder is around 500Mb, with around 250 tsm files (that are shards, right?). Version same as OP

Suggest to make the error line a bit more descriptive (mentioning 10 seconds) and possibly add this timeout in the configuration file.

Starting the database manually from command-line did not show this error. In hindsight this is obvious, but for me it got me confused quite a lot.

GlennMatthys commented 2 years ago

I was able to make InfluxDB start by changing the following in /usr/lib/influxdb/scripts/influxd-systemd-start.sh:

max_attempts=100

Default max_attempts is 10.

amfasis commented 2 years ago

It turned out my database folder got corrupt which made loading so slow. I have now reinstalled influxdb and restored the backup and loading on my Raspberry Pi has improved to well below the 10 seconds.

prutschman commented 2 years ago

In my case increasing the number of attempts didn't help. It simply takes longer than 10 seconds for my database instance to start, so it times out each time.