Open barrown opened 1 year ago
Jan 23 17:51:16 hass influxd[741]: ts=2023-01-23T17:51:16.668081Z lvl=info msg="http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 1s" log_id=0fZwjjd0000 service=http
Jan 23 17:51:17 hass influxd[741]: ts=2023-01-23T17:51:17.668457Z lvl=info msg="http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 1s" log_id=0fZwjjd0000 service=http
Jan 23 17:51:18 hass influxd[741]: ts=2023-01-23T17:51:18.669742Z lvl=info msg="http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 5ms" log_id=0fZwjjd0000 service=http
Jan 23 17:51:18 hass influxd[741]: ts=2023-01-23T17:51:18.673745Z lvl=debug msg="user find by ID" log_id=0fZwjjd0000 store=new took=0.239ms
Jan 23 17:51:18 hass influxd[741]: ts=2023-01-23T17:51:18.675510Z lvl=info msg="http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 10ms" log_id=0fZwjjd0000 service=http
Jan 23 17:51:18 hass influxd[741]: ts=2023-01-23T17:51:18.676462Z lvl=debug msg=Request log_id=0fZwjjd0000 service=http method=GET host=localhost:8086 path=/api/v2/backup/kv query= proto=HTTP/1.1 status_code=500 response_size=68 content_length=0 referrer= remote=[::1]:38928 user_agent=Go-http-client took=5.561ms error="internal error" error_code="internal error" body=
Jan 23 17:51:18 hass influxd[741]: ts=2023-01-23T17:51:18.687324Z lvl=info msg="http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 20ms" log_id=0fZwjjd0000 service=http
Jan 23 17:51:18 hass influxd[741]: ts=2023-01-23T17:51:18.690145Z lvl=debug msg="is onboarding" log_id=0fZwjjd0000 handler=onboard took=0.313ms
Jan 23 17:51:18 hass influxd[741]: ts=2023-01-23T17:51:18.690259Z lvl=debug msg="Onboarding eligibility check finished" log_id=0fZwjjd0000 result=false
Jan 23 17:51:18 hass influxd[741]: ts=2023-01-23T17:51:18.690853Z lvl=debug msg=Request log_id=0fZwjjd0000 service=http method=GET host=localhost:8086 path=/api/v2/setup query= proto=HTTP/1.1 status_code=200 response_size=21 content_length=0 referrer= remote=[::1]:38928 user_agent=influx took=1.325ms body=
Because of the "too many open files" error I increased the limit. Now I can get to the web interface and view some of the data, but not actually perform a backup.
output from ulimit -a:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 6362
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1000000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 95
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 6362
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
How many shards (tsm files) do you have? You can run an influxd inspect report-tsm
and it should give you a summary.
When you say you cannot perform a backup, what do you mean exactly?
Thanks for your reply!
By backup I mean run "influx backup" with my token. Which results in:
2023-01-24T10:53:04.520822Z info Backing up KV store {"log_id": "0f_rPOY0000", "path": "/ssd/influx/backups/backup_2023-01-24_10-53/20230124T105304Z.bolt"}
Error: Failed to download KV backup: An internal error has occurred.
"journalctl -u influxdb.service" reveals:
Jan 24 10:49:48 hass influxd[3191]: ts=2023-01-24T10:49:48.051374Z lvl=info msg="http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 5ms" log_id=0f_qxX80000 service=http
Jan 24 10:49:48 hass influxd[3191]: ts=2023-01-24T10:49:48.053074Z lvl=debug msg="user find by ID" log_id=0f_qxX80000 store=new took=0.134ms
Jan 24 10:49:48 hass influxd[3191]: ts=2023-01-24T10:49:48.054286Z lvl=debug msg=Request log_id=0f_qxX80000 service=http method=GET host=localhost:8086 path=/api/v2/backup/kv query= proto=HTTP/1.1 status_code=500 response_size=68 content_length=
Jan 24 10:49:48 hass influxd[3191]: ts=2023-01-24T10:49:48.056743Z lvl=info msg="http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 10ms" log_id=0f_qxX80000 service=http
Jan 24 10:49:48 hass influxd[3191]: ts=2023-01-24T10:49:48.057355Z lvl=debug msg="is onboarding" log_id=0f_qxX80000 handler=onboard took=0.280ms
Jan 24 10:49:48 hass influxd[3191]: ts=2023-01-24T10:49:48.057439Z lvl=debug msg="Onboarding eligibility check finished" log_id=0f_qxX80000 result=false
Jan 24 10:49:48 hass influxd[3191]: ts=2023-01-24T10:49:48.057970Z lvl=debug msg=Request log_id=0f_qxX80000 service=http method=GET host=localhost:8086 path=/api/v2/setup query= proto=HTTP/1.1 status_code=200 response_size=21 content_length=0 re
Jan 24 10:49:48 hass influxd[3191]: ts=2023-01-24T10:49:48.067161Z lvl=info msg="http: Accept error: accept tcp [::]:8086: accept4: too many open files; retrying in 20ms" log_id=0f_qxX80000 service=http
In the end managed to "influxd inspect export-lp" with a date range of one month. So at least I have my data safely out now.
"influxd inspect report-tsm" only became available in 2.1 and I'm on 2.0. I have been trying to upgrade for a while now but could never manage the backup-restore to a newer version, see #23212
I have influxdb 2.0.6 running on a raspberry pi quite happily for nearly 2 years, but this morning I noticed it had stopped responding. Even after a reboot it is not happy. I can't get any connection via the web interface (it justs shows the swirling circle) nor via the HTTP API, nor via influx CLI (e.g. influx bucket list just hangs there).
Can anyone advise any more steps to try? Influxd seems to startup fine without error.
Environment info:
Linux 5.10.17-v8+ aarch64 InfluxDB 2.0.6 (git: 4db98b4c9a) build_date: 2021-04-29T16:48:12Z Database files are all on an SSD drive
Config: /usr/local/bin/influxd --bolt-path=/ssd/influx/influxd.bolt --engine-path=/ssd/influx/engine --reporting-disabled --storage-retention-check-interval=24h --log-level=debug
Logs: