kizniche / Mycodo

An environmental monitoring and regulation system
http://kylegabriel.com/projects/
GNU General Public License v3.0
2.94k stars 492 forks source link

Influxdb timeout preventing daemon from starting #1275

Closed Bingo-Curios closed 6 months ago

Bingo-Curios commented 1 year ago

So I have created a brand new deployment using the fresh SD card and pulled the install using the curl script/command. The installation goes through fine except at the end where it is trying to restart the daemon! It gets stuck there. This is on Raspberry Pi Zero. Now I take the same SD card over to a Raspberry Pi 4, remove my Mycodo folder and do another fresh install. Everything works fine as it should! Thanks. Next I take the SD card and pop it into Raspberry Pi Zero and I see that the daemon does not start. I look at the logs and see a message that something has terminated the daemon. I try this multiple times and try to start the daemon without much luck but then I run daemon in the debug mode on the Raspberry Pi Zero and again everything is fine. Not sure what his happening. Likely some timing issue with the slower processor. Any advice on how to fix this?

kizniche commented 1 year ago

The title should only be a brief explanation of the issue. Do not put more than a few words in it.

Please attach your Mycodo/install/setup.log.

Bingo-Curios commented 1 year ago

ok ... let me do this again, the last log I have is the successful log. I will attempt it again on the zero and capture the log.

Bingo-Curios commented 1 year ago

Here is the setuplog of a fresh install ... looks like it was timing out on connecting to the db. ... this was done on the Raspberry Pi Zero setup.log

kizniche commented 1 year ago

This does not look like the initial install because influxdb was already installed. Is this the case?

#### Ensuring compatible version of influxdb 1.x is installed ####
Correct version of InfluxDB currently installed
Bingo-Curios commented 1 year ago

yes... because, I deleted the successful install from Pi 4 and reinstalled... if you want a clean one done, I can rebuild all the way from the OS. meaning .. I can start with a clean SD card.

kizniche commented 1 year ago

You can't "delete an install", you deleted a directory and reinitiated an install that merely skipped many of the install processes that had already been completed. Without the log from the initial install, it's impossible to know what, if anything, went wrong during the initial install.

Bingo-Curios commented 1 year ago

now doing a full install.. just rebuilt my SD card with the OS and going on to install Mycodo

kizniche commented 1 year ago

Which OS?

Bingo-Curios commented 1 year ago

Raspberry PI OS Lite 32 bit

Bingo-Curios commented 1 year ago

Here is the new setup.log file after a full fresh install. Sequence - Fresh SD card Deploy standard Raspberry Pi OS Lite 32 bit using RPI Imager; put it into a Pi Zero; login and run -- sudo apt-get update; sudo apt-get upgrade; curl -L https://kizniche.github.io/Mycodo/install |bash

setup.log

kizniche commented 1 year ago

No idea what could be wrong. Do you get any errors attempting to start the daemon manually? (Note: do not run the daemon in a production environment this way).

sudo ~/Mycodo/env/bin/python ~/Mycodo/mycodo/mycodo_daemon.py
Bingo-Curios commented 1 year ago

Daemon start okay when started manually. Here are a few other observations if it helps:

If I restart the system, the daemon will not start. It will start if started manually as you mention above. This is on the Pi Zero If I take this same SD card that was installed on Raspberry Pi Zero and having trouble to a Raspberry Pi 4, the daemon starts normally on startup! Similarly, I have setup a SD card completely on a Pi4 and seen to work fine, including starting the daemon on startup. I take this card to the Pi Zero, the OS boots okay as expected but the Mycodo daemon does not start on system boot.

Let me know if any other logs will help.

vuilfanie commented 1 year ago

I have the same error on a fresh install, also Raspberry Pi zero with same OS: Raspberry PI OS Lite 32 bit setup.log get no errors when running: sudo ~/Mycodo/env/bin/python ~/Mycodo/mycodo/mycodo_daemon.py

kizniche commented 1 year ago

What is the status?

sudo service mycodo status
vuilfanie commented 1 year ago

gg@hector:~ $ sudo service mycodo status ● mycodo.service - Mycodo server Loaded: loaded (/home/gg/Mycodo/install/mycodo.service; enabled; vendor preset: enabled) Active: activating (start-pre) since Wed 2023-02-15 13:23:14 SAST; 2h 19min ago

kizniche commented 1 year ago

Looks like it's stuck on ExecStartPre

https://github.com/kizniche/Mycodo/blob/247adbfd3948a0e999e92091689b58dbfd3091f3/install/mycodo.service#L12

Which checks if influxdb is accessible

https://github.com/kizniche/Mycodo/blob/247adbfd3948a0e999e92091689b58dbfd3091f3/install/influxdb_wait_until_running.sh#L5

What is the status of influxdb?

sudo service influxdb status
vuilfanie commented 1 year ago

aah the good old influxdb loop

gg@hector:~ $ sudo service influxdb status ● influxdb.service - InfluxDB is an open-source, distributed, time series database Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled) Active: activating (start) since Wed 2023-02-15 15:46:32 SAST; 56s ago Docs: https://docs.influxdata.com/influxdb/ Cntrl PID: 18243 (influxd-systemd) Tasks: 7 (limit: 415) CPU: 43.976s CGroup: /system.slice/influxdb.service ├─18243 /bin/bash -e /usr/lib/influxdb/scripts/influxd-systemd-start.sh ├─18248 /usr/bin/influxd -config /etc/influxdb/influxdb.conf └─18677 curl -k -s -o /dev/null http://localhost:8086/health -w %{http_code}

Feb 15 15:47:15 hector influxd-systemd-start.sh[18243]: InfluxDB API unavailable after 31 attempts... Feb 15 15:47:16 hector influxd-systemd-start.sh[18243]: InfluxDB API unavailable after 32 attempts... Feb 15 15:47:17 hector influxd-systemd-start.sh[18243]: InfluxDB API unavailable after 33 attempts... Feb 15 15:47:19 hector influxd-systemd-start.sh[18243]: InfluxDB API unavailable after 34 attempts... Feb 15 15:47:20 hector influxd-systemd-start.sh[18243]: InfluxDB API unavailable after 35 attempts... Feb 15 15:47:21 hector influxd-systemd-start.sh[18243]: InfluxDB API unavailable after 36 attempts... Feb 15 15:47:23 hector influxd-systemd-start.sh[18243]: InfluxDB API unavailable after 37 attempts... Feb 15 15:47:24 hector influxd-systemd-start.sh[18243]: InfluxDB API unavailable after 38 attempts... Feb 15 15:47:25 hector influxd-systemd-start.sh[18243]: InfluxDB API unavailable after 39 attempts... Feb 15 15:47:27 hector influxd-systemd-start.sh[18243]: InfluxDB API unavailable after 40 attempts...

kizniche commented 1 year ago

Perhaps try increasing the timeout in the service file, as this comment suggests.

vuilfanie commented 1 year ago

so create this file /etc/systemd/system/influxdb.service.d/override.conf and add: [Service] TimeoutStartSec=45m ?

kizniche commented 1 year ago

I've never used an override, I've always just edited the service file itself. You can see in the output you pasted the location of the file. Don't forget to issue the daemon-reload command. Then you can issue sudo service influxdb restart

vuilfanie commented 1 year ago

ok, and then just add

[Service] TimeoutStartSec=45m

?

kizniche commented 1 year ago

The Service section should already exist, so all you need to do is add TimeoutStartSec

kizniche commented 1 year ago

Also see: https://github.com/kizniche/Mycodo/issues/1202, https://github.com/kizniche/Mycodo/issues/1191

vuilfanie commented 1 year ago

ok think I'm in the wrong file as I only see this

sudo vi influxd.service

If you modify this, please also make sure to edit init.sh

[Unit] Description=InfluxDB is an open-source, distributed, time series database Documentation=https://docs.influxdata.com/influxdb/ After=network-online.target

[Service] User=influxdb Group=influxdb LimitNOFILE=65536 EnvironmentFile=-/etc/default/influxdb ExecStart=/usr/lib/influxdb/scripts/influxd-systemd-start.sh KillMode=control-group Restart=on-failure Type=forking PIDFile=/var/lib/influxdb/influxd.pid

[Install] WantedBy=multi-user.target Alias=influxd.service ~

vuilfanie commented 1 year ago

after changing sleep to 5

gg@hector:/etc/systemd/system $ sudo service influxdb restart Job for influxdb.service failed because a timeout was exceeded. See "systemctl status influxdb.service" and "journalctl -xe" for details. gg@hector:/etc/systemd/system $ systemctl status influxdb.service ● influxdb.service - InfluxDB is an open-source, distributed, time series database Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled) Active: activating (start) since Wed 2023-02-15 17:11:46 SAST; 43s ago Docs: https://docs.influxdata.com/influxdb/ Cntrl PID: 23815 (influxd-systemd) Tasks: 7 (limit: 415) CPU: 34.472s CGroup: /system.slice/influxdb.service ├─23815 /bin/bash -e /usr/lib/influxdb/scripts/influxd-systemd-start.sh ├─23816 /usr/bin/influxd -config /etc/influxdb/influxdb.conf └─24069 sleep 5

Feb 15 17:11:46 hector influxd-systemd-start.sh[23816]: ts=2023-02-15T15:11:46.603550Z lvl=info msg="Go runtime" log_id=0g1PxPbl000 version=go1.13.8 maxprocs=1 Feb 15 17:11:46 hector influxd-systemd-start.sh[23832]: Merging with configuration at: /etc/influxdb/influxdb.conf Feb 15 17:11:47 hector influxd-systemd-start.sh[23815]: InfluxDB API unavailable after 1 attempts... Feb 15 17:11:53 hector influxd-systemd-start.sh[23815]: InfluxDB API unavailable after 2 attempts... Feb 15 17:11:58 hector influxd-systemd-start.sh[23815]: InfluxDB API unavailable after 3 attempts... Feb 15 17:12:04 hector influxd-systemd-start.sh[23815]: InfluxDB API unavailable after 4 attempts... Feb 15 17:12:09 hector influxd-systemd-start.sh[23815]: InfluxDB API unavailable after 5 attempts... Feb 15 17:12:14 hector influxd-systemd-start.sh[23815]: InfluxDB API unavailable after 6 attempts... Feb 15 17:12:20 hector influxd-systemd-start.sh[23815]: InfluxDB API unavailable after 7 attempts... Feb 15 17:12:25 hector influxd-systemd-start.sh[23815]: InfluxDB API unavailable after 8 attempts...

kizniche commented 1 year ago

Did you run the daemon-reload command so the changes take effect?

vuilfanie commented 1 year ago

ok think I'm in the wrong file as I only see this

sudo vi influxd.service

If you modify this, please also make sure to edit init.sh

[Unit] Description=InfluxDB is an open-source, distributed, time series database Documentation=https://docs.influxdata.com/influxdb/ After=network-online.target

[Service] User=influxdb Group=influxdb LimitNOFILE=65536 EnvironmentFile=-/etc/default/influxdb ExecStart=/usr/lib/influxdb/scripts/influxd-systemd-start.sh KillMode=control-group Restart=on-failure Type=forking PIDFile=/var/lib/influxdb/influxd.pid

[Install] WantedBy=multi-user.target Alias=influxd.service ~

yes this is the content of /etc/systemd/system/influxd.service no mention of time out

after changing sleep to 5 i ran: sudo service influxdb restart

and in the output it shows sleep is 5 so it should have taken affect?

kizniche commented 1 year ago

I asked about whether you issued the daemon-reload command. Also, how long have you given influxdb to start? Your log indicates it has only been attempting to start for less than a minute.

vuilfanie commented 1 year ago

can you give me the daemon-reload commnd please? It's been trying to start ever since I made the change which was 2 hours ago.

gets to about 12 attempts and then says this

gg@hector:/etc/systemd/system $ sudo service influxdb status ● influxdb.service - InfluxDB is an open-source, distributed, time series database Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled) Active: activating (start) since Wed 2023-02-15 19:33:33 SAST; 1min 5s ago Docs: https://docs.influxdata.com/influxdb/ Cntrl PID: 13641 (influxd-systemd) Tasks: 7 (limit: 415) CPU: 47.951s CGroup: /system.slice/influxdb.service ├─13641 /bin/bash -e /usr/lib/influxdb/scripts/influxd-systemd-start.sh ├─13643 /usr/bin/influxd -config /etc/influxdb/influxdb.conf └─14047 sleep 5

Feb 15 19:33:45 hector influxd-systemd-start.sh[13641]: InfluxDB API unavailable after 3 attempts... Feb 15 19:33:50 hector influxd-systemd-start.sh[13641]: InfluxDB API unavailable after 4 attempts... Feb 15 19:33:56 hector influxd-systemd-start.sh[13641]: InfluxDB API unavailable after 5 attempts... Feb 15 19:34:01 hector influxd-systemd-start.sh[13641]: InfluxDB API unavailable after 6 attempts... Feb 15 19:34:07 hector influxd-systemd-start.sh[13641]: InfluxDB API unavailable after 7 attempts... Feb 15 19:34:13 hector influxd-systemd-start.sh[13641]: InfluxDB API unavailable after 8 attempts... Feb 15 19:34:18 hector influxd-systemd-start.sh[13641]: InfluxDB API unavailable after 9 attempts... Feb 15 19:34:23 hector influxd-systemd-start.sh[13641]: InfluxDB API unavailable after 10 attempts... Feb 15 19:34:28 hector influxd-systemd-start.sh[13641]: InfluxDB API unavailable after 11 attempts... Feb 15 19:34:34 hector influxd-systemd-start.sh[13641]: InfluxDB API unavailable after 12 attempts... gg@hector:/etc/systemd/system $ sudo service influxdb status ● influxdb.service - InfluxDB is an open-source, distributed, time series database Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled) Active: activating (start) since Wed 2023-02-15 19:35:03 SAST; 15s ago Docs: https://docs.influxdata.com/influxdb/ Cntrl PID: 14297 (influxd-systemd) Tasks: 7 (limit: 415) CPU: 11.585s CGroup: /system.slice/influxdb.service ├─14297 /bin/bash -e /usr/lib/influxdb/scripts/influxd-systemd-start.sh ├─14298 /usr/bin/influxd -config /etc/influxdb/influxdb.conf └─14353 sleep 5

Feb 15 19:35:03 hector systemd[1]: Starting InfluxDB is an open-source, distributed, time series database... Feb 15 19:35:04 hector influxd-systemd-start.sh[14300]: Merging with configuration at: /etc/influxdb/influxdb.conf Feb 15 19:35:04 hector influxd-systemd-start.sh[14298]: ts=2023-02-15T17:35:04.282733Z lvl=info msg="InfluxDB starting" log_id=0g1Y9AMW000 version=1.8.10 branch=1.8 commit=688e697c51fd Feb 15 19:35:04 hector influxd-systemd-start.sh[14298]: ts=2023-02-15T17:35:04.287104Z lvl=info msg="Go runtime" log_id=0g1Y9AMW000 version=go1.13.8 maxprocs=1 Feb 15 19:35:04 hector influxd-systemd-start.sh[14316]: Merging with configuration at: /etc/influxdb/influxdb.conf Feb 15 19:35:05 hector influxd-systemd-start.sh[14297]: InfluxDB API unavailable after 1 attempts... Feb 15 19:35:10 hector influxd-systemd-start.sh[14297]: InfluxDB API unavailable after 2 attempts... Feb 15 19:35:15 hector influxd-systemd-start.sh[14297]: InfluxDB API unavailable after 3 attempts... gg@hector:/etc/systemd/system $

kizniche commented 1 year ago

See https://github.com/kizniche/Mycodo/issues/1275#issuecomment-1431402078

vuilfanie commented 1 year ago

done, still the same

gg@hector:/etc/systemd/system $ sudo service influxdb status ● influxdb.service - InfluxDB is an open-source, distributed, time series database Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled) Active: activating (start) since Wed 2023-02-15 19:56:51 SAST; 1min 24s ago Docs: https://docs.influxdata.com/influxdb/ Cntrl PID: 23064 (influxd-systemd) Tasks: 7 (limit: 415) CPU: 58.309s CGroup: /system.slice/influxdb.service ├─23064 /bin/bash -e /usr/lib/influxdb/scripts/influxd-systemd-start.sh ├─23065 /usr/bin/influxd -config /etc/influxdb/influxdb.conf └─23795 curl -k -s -o /dev/null http://localhost:8086/health -w %{http_code}

Feb 15 19:57:21 hector influxd-systemd-start.sh[23064]: InfluxDB API unavailable after 6 attempts... Feb 15 19:57:26 hector influxd-systemd-start.sh[23064]: InfluxDB API unavailable after 7 attempts... Feb 15 19:57:32 hector influxd-systemd-start.sh[23064]: InfluxDB API unavailable after 8 attempts... Feb 15 19:57:37 hector influxd-systemd-start.sh[23064]: InfluxDB API unavailable after 9 attempts... Feb 15 19:57:43 hector influxd-systemd-start.sh[23064]: InfluxDB API unavailable after 10 attempts... Feb 15 19:57:48 hector influxd-systemd-start.sh[23064]: InfluxDB API unavailable after 11 attempts... Feb 15 19:57:54 hector influxd-systemd-start.sh[23064]: InfluxDB API unavailable after 12 attempts... Feb 15 19:57:59 hector influxd-systemd-start.sh[23064]: InfluxDB API unavailable after 13 attempts... Feb 15 19:58:05 hector influxd-systemd-start.sh[23064]: InfluxDB API unavailable after 14 attempts... Feb 15 19:58:10 hector influxd-systemd-start.sh[23064]: InfluxDB API unavailable after 15 attempts... gg@hector:/etc/systemd/system $ sudo service influxdb status ● influxdb.service - InfluxDB is an open-source, distributed, time series database Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled) Active: activating (start) since Wed 2023-02-15 19:58:22 SAST; 1s ago Docs: https://docs.influxdata.com/influxdb/ Cntrl PID: 23872 (influxd-systemd) Tasks: 7 (limit: 415) CPU: 1.280s CGroup: /system.slice/influxdb.service ├─23872 /bin/bash -e /usr/lib/influxdb/scripts/influxd-systemd-start.sh ├─23873 /usr/bin/influxd -config /etc/influxdb/influxdb.conf └─23906 sleep 5

Feb 15 19:58:22 hector systemd[1]: Starting InfluxDB is an open-source, distributed, time series database... Feb 15 19:58:22 hector influxd-systemd-start.sh[23875]: Merging with configuration at: /etc/influxdb/influxdb.conf Feb 15 19:58:22 hector influxd-systemd-start.sh[23873]: ts=2023-02-15T17:58:22.749372Z lvl=info msg="InfluxDB starting" log_id=0g1ZUX7G000 version=1.8.10 branch=1.8 commit=688e697c51fd Feb 15 19:58:22 hector influxd-systemd-start.sh[23873]: ts=2023-02-15T17:58:22.757403Z lvl=info msg="Go runtime" log_id=0g1ZUX7G000 version=go1.13.8 maxprocs=1 Feb 15 19:58:23 hector influxd-systemd-start.sh[23891]: Merging with configuration at: /etc/influxdb/influxdb.conf Feb 15 19:58:23 hector influxd-systemd-start.sh[23872]: InfluxDB API unavailable after 1 attempts... gg@hector:/etc/systemd/system $

kizniche commented 1 year ago

In all the service files you pasted, I don't see the TimeoutStartSec added.

Bingo-Curios commented 1 year ago

So, I had the same conclusion that this has to do with influxdb and on searching around, last night, I found that commenting out the the line Type=forking in the start up file /etc/systemd/system/multi-user.target.wants/influxdb.service ... fixed it. I am not yet sure of any other sideeffects, but the influxdb starts and so does the mycodo daemon. The app works but it is awfully slow... I guess that is to be expected from the RPI Zero.

vuilfanie commented 1 year ago

added TimeoutStartSec, reloaded daemon, influxdb is now running but unable to access UI. Can you share the other service names that need to be running?

kizniche commented 7 months ago

This issue has been mentioned on Radical DIY Forum. There might be relevant details there:

https://forum.radicaldiy.com/t/error-mycodo-influx-unknown-influxdb-version/1773/5

danOS144 commented 6 months ago

This issue has been mentioned on Radical DIY Forum. There might be relevant details there:

https://forum.radicaldiy.com/t/error-mycodo-influx-unknown-influxdb-version/1773/5.

this threat will refer to this one, recursive

danOS144 commented 6 months ago

still not solved: the problem that the daemon cant find/ doesn't connect to influxdb, because its unknown version of Influxdb.