Closed mstormi closed 2 years ago
Are we manually creating the log directory for grafana as part of its setup, I think that might be part of the issue as we have had to do that with other programs.
Dunno, never touched that code. Last changes were by @holgerfriedrich who is not well zram aware so I would guess "no".
To be honest I don't know what the cause could be on this I would have to install and debug it to try and figure it out, is it because of the buster/bullseye changes?
Either way would make sense to put that into permissions_corrections() in system.bash.
Do you also think we should call that after every (or every eventually affected) 3rd party install ? I think we should add a parameter to that routine so we can call it to only fix stuff for say grafana => #1561
is it because of the buster/bullseye changes?
unlikely as there aren't any to affect this I think, and it happens on existing buster systems, too.
OK just wanted to make sure, I haven't really followed grafana usage as I don't use it personally. Does the problem still occur with zram disabled?
neither do I yes after I disabled zram and reisntalled Grafana, it runs
So, it looks like a zram issue again, correct?
It's unclear. I reenabled zram, reinstalled Grafana and it still works. Tried some more but cannot reproduce it again. Can you try to reproduce it, too ? Eventually existence of /var/log/grafana is key. If I understood it right problem with the OP only shows up after a reboot.
I'm inclined to believe my mis-installed was some bad coincidence.
What I saw now is what the OP reports: grafana does not start after a reboot. systemctl restart grafana-server.service` helped. So likely another startup order thing. Where do we currently configure those services that need to wait to start until zram is on ?
That might be our solution, try adding if ! zram_dependency install grafana-server; then return 1; fi
to the install code see openhab.bash line 108 for an example.
it's already in (influxdb+grafana.bash L270)
Yeah IDK, you got me there. I wonder is there another service besides grafana-server that we need to add that we aren't seeing?
EDIT:
L270 is just if ! zram_dependency install grafana; then return 1; fi
when it probably should be if ! zram_dependency install grafana-server; then return 1; fi
I'm issuing a patch and we'll see if that resolves the issue.
Its working for me now so I'm closing. Reopen if it doesn't work for you.
Well spotted. I've ever been seeing that grafana.service
was offered as an argument to systemctl but it never worked to query that.
Unfortunately even with your patch after boot, grafana remains down.
systemctl start grafana-server.service
revives it but it does not start automatically - reproduceably so.
No idea for now.
[11:56:12] root@devpi:/home/openhabian# journalctl -u grafana-server.service
-- Logs begin at Thu 2021-08-12 11:31:21 CEST, end at Thu 2021-08-12 11:55:01 CEST. --
Aug 12 11:46:13 devpi systemd[1]: Started Grafana instance.
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Starting Grafana" logger=server version=8.1.1 commit=90c87a52>
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Config loaded from" logger=settings file=/usr/share/grafana/c>
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Config loaded from" logger=settings file=/etc/grafana/grafana>
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Config overridden from command line" logger=settings arg="def>
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Config overridden from command line" logger=settings arg="def>
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Config overridden from command line" logger=settings arg="def>
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Config overridden from command line" logger=settings arg="def>
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Path Home" logger=settings path=/usr/share/grafana
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Path Data" logger=settings path=/var/lib/grafana
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Path Logs" logger=settings path=/var/log/grafana
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Path Plugins" logger=settings path=/var/lib/grafana/plugins
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Path Provisioning" logger=settings path=/etc/grafana/provisio>
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="App mode production" logger=settings
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Writing PID file" logger=server path=/var/run/grafana/grafana>
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Connecting to DB" logger=sqlstore dbtype=sqlite3
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Starting DB migrations" logger=migrator
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="migrations completed" logger=migrator performed=0 skipped=330>
Aug 12 11:46:19 devpi grafana-server[1061]: t=2021-08-12T11:46:19+0200 lvl=info msg="Starting plugin search" logger=plugins
With that guy in the link up in post #1 it's even worse neither mosquitto nor Grafana start.
Can confirm that for me for months now Mosquito has worked fine on reboot. I use it on my personal machine and it had never showed any issues.
That is really weird, its almost like Grafana is having issues with systemctl ordering I'm going to make a couple more changes to the code and see if I can't get it to work for me.
@mstormi I just installed on a new image that I flashed from a manual build on the latest commit I made and grafana is working fine across reboots. Do you have the same issue still? I used this run for my image.
Yes I do. I used 1.6.5 but it should be the same shouldn't it.
Should yes but maybe you should try the image I used just in case it made the difference because it was working as expected for me after a fresh install and multiple reboots on the newer image.
No it did not make a difference. Grafana is shown as active for a second or two when manually started then stops (maybe you got caught by this ?).
Reason is /var/log/grafana
does not exist and cannot be created.
Worse, even openHAB is affected because /var/log/openhab is gone, too.
We have a massive zram problem here! Looks like changes to /var/log are not synced on reboot at least if done directly after installation.
[11:49:20] root@openhabian:/home/openhabian# df /var/log
Filesystem 1K-blocks Used Available Use% Mounted on
overlay3 429960 972 396732 1% /var/log
[11:49:25] root@openhabian:/home/openhabian# ls /opt/zram/log.bind/
auth.log btmp daemon.log debug kern.log lastlog messages private syslog user.log wtmp
[11:50:37] root@openhabian:/home/openhabian# ls -l /opt/zram/log.bind/
total 140
-rw-rw---- 1 root adm 212 Aug 13 11:38 auth.log
-rw-rw---- 1 root utmp 0 Aug 13 11:38 btmp
-rw-rw---- 1 root adm 12887 Aug 13 11:38 daemon.log
-rw-rw---- 1 root adm 1366 Aug 13 11:38 debug
-rw-rw---- 1 root adm 30584 Aug 13 11:38 kern.log
-rw-rw-r-- 1 root utmp 0 Aug 13 11:38 lastlog
-rw-rw---- 1 root adm 28472 Aug 13 11:38 messages
drwx-wx--- 2 root root 4096 Aug 13 11:38 private
-rw-rw---- 1 root adm 44278 Aug 13 11:38 syslog
-rw-rw---- 1 root adm 396 Aug 13 11:38 user.log
-rw-rw-r-- 1 root utmp 384 Aug 13 11:38 wtmp
[11:50:41] root@openhabian:/home/openhabian# ls -l /opt/zram/zram3/upper/
total 264
-rw-rw---- 1 root adm 1162 Aug 13 11:45 auth.log
-rw-rw---- 1 root adm 55118 Aug 13 11:50 daemon.log
-rw-rw---- 1 root adm 1418 Aug 13 11:47 debug
-rw-rw---- 1 root adm 33830 Aug 13 11:49 kern.log
-rw-rw-r-- 1 root utmp 292292 Aug 13 11:39 lastlog
-rw-rw---- 1 root adm 31293 Aug 13 11:48 messages
-rw-rw---- 1 root adm 90984 Aug 13 11:50 syslog
-rw-rw-r-- 1 root utmp 1920 Aug 13 11:39 wtmp
I'm very confused, I don't observe any of this behavior on my end. Let me test again, also do you observe it on your production machine?
Figured out my discrepancy, for some reason zram wasn't on my installation that I did yesterday so of course it was working flawlessly. Retesting with it installed now.
Interestingly, if zram is installed after everything is set up it seems to be working properly.
EDIT: Observing broken behavior with overlays merge script right now, investigating.
if zram is installed after everything is set up it seems to be working properly.
Of course, /var/log/{grafana,mosquitto,whatever} will exist on the lower filesystem then. If zram is installed before those 3rd party tools, they get created in the upper filesystem only and need to be synced/merged on zram stop and if that fails they are missing after boot, causing the tools to fail.
Yes, there is a bug in the zram-config code right now that many very well be the issue causing the problem, I'll let you know when I attempt a fix so you can see if it works for you too.
@mstormi I hopefully fixed the bug, uninstall zram and reinstall it and see if the issue is resolved, or run a fresh setup and that should work too.
Yes first attempt (reinstall) was successful. Installing another 1.6.5 image right now.
Installing another 1.6.5 image right now.
Failed :(
It ran after install but /var/log/grafana
was missing after reboot, it worked after I manually created that.
That's odd, is it repeatable?
will try. Will the latest zram code be downloaded for sure or is it the code from the image that gets installed ?
Hmm, that's a good question, I'm not sure I think it should download the latest code but let me check.
It should download the latest code but if you suspect it is not, double check and if it isn't I'll look into fixing that behavior.
No it didn't obtain the latest version !
[11:21:58] root@openhab-test:/boot# wget https://raw.githubusercontent.com/ecdye/zram-config/main/zram-config
--2021-08-15 11:22:02-- https://raw.githubusercontent.com/ecdye/zram-config/main/zram-config
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9366 (9.1K) [text/plain]
Saving to: ‘zram-config’
zram-config 100%[====================================================================================================>] 9.15K --.-KB/s in 0.003s
2021-08-15 11:22:02 (2.72 MB/s) - ‘zram-config’ saved [9366/9366]
[11:22:02] root@openhab-test:/boot# ls -l /usr/local/sbin/zram-config /boot/zram-config /opt/zram/zram-config/zram-config
-rwxr-xr-x 1 root root 9366 Aug 15 11:22 /boot/zram-config
-rwxrwxr-x 1 root root 11234 Aug 15 00:20 /opt/zram/zram-config/zram-config
-rwxr-xr-x 1 root root 11234 Aug 15 00:20 /usr/local/sbin/zram-config
[11:22:10] root@openhab-test:/boot#
Did you ever test with 1.6.5 or only with dev builds ?
Here's the log excerpt from first-boot.log
.
I didn't spot anything right away. Checked that 'install' overwrites the binary but it seems to do.
+ install_zram_code /opt/zram
+ local zramGit=https://github.com/ecdye/zram-config
++ timestamp
++ date +%F_%T_%Z
+ echo -n '2021-08-15_00:20:12_CEST [openHABian] Installing zram code... '
2021-08-15_00:20:12_CEST [openHABian] Installing zram code... + cond_redirect mkdir -p /opt/zram
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ mkdir -p /opt/zram \033[39;49;00m'
$ mkdir -p /opt/zram
+ mkdir -p /opt/zram
+ return 0
+ [[ -d /opt/zram/zram-config ]]
+ cond_redirect update_git_repo /opt/zram/zram-config openHAB
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ update_git_repo /opt/zram/zram-config openHAB \033[39;49;00m'
$ update_git_repo /opt/zram/zram-config openHAB
+ update_git_repo /opt/zram/zram-config openHAB
+ local branch
+ local path
+ branch=openHAB
+ path=/opt/zram/zram-config
++ timestamp
++ date +%F_%T_%Z
++ basename /opt/zram/zram-config
+ echo -n '2021-08-15_00:20:12_CEST [openHABian] Updating zram-config, openHAB branch from git... '
2021-08-15_00:20:12_CEST [openHABian] Updating zram-config, openHAB branch from git... + cond_redirect git -C /opt/zram/zram-config fetch origin
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ git -C /opt/zram/zram-config fetch origin \033[39;49;00m'
$ git -C /opt/zram/zram-config fetch origin
+ git -C /opt/zram/zram-config fetch origin
From https://github.com/ecdye/zram-config
+ 77808ea...a2110ef openHAB -> origin/openHAB (forced update)
047fd98..203c82b main -> origin/main
+ 3cc2728...d50eec8 openHAB2 -> origin/openHAB2 (forced update)
* [new tag] v1.2.7 -> v1.2.7
* [new tag] v1.2.5 -> v1.2.5
* [new tag] v1.2.6 -> v1.2.6
+ return 0
+ cond_redirect git -C /opt/zram/zram-config fetch --tags --force --prune
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ git -C /opt/zram/zram-config fetch --tags --force --prune \033[39;49;00m'
$ git -C /opt/zram/zram-config fetch --tags --force --prune
+ git -C /opt/zram/zram-config fetch --tags --force --prune
+ return 0
+ cond_redirect git -C /opt/zram/zram-config reset --hard origin/openHAB
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ git -C /opt/zram/zram-config reset --hard origin/openHAB \033[39;49;00m'
$ git -C /opt/zram/zram-config reset --hard origin/openHAB
+ git -C /opt/zram/zram-config reset --hard origin/openHAB
HEAD is now at a2110ef Add openHAB specific changes
+ return 0
+ cond_redirect git -C /opt/zram/zram-config clean --force -x -d
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ git -C /opt/zram/zram-config clean --force -x -d \033[39;49;00m'
$ git -C /opt/zram/zram-config clean --force -x -d
+ git -C /opt/zram/zram-config clean --force -x -d
+ return 0
+ cond_redirect git -C /opt/zram/zram-config checkout openHAB
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ git -C /opt/zram/zram-config checkout openHAB \033[39;49;00m'
$ git -C /opt/zram/zram-config checkout openHAB
+ git -C /opt/zram/zram-config checkout openHAB
Already on 'openHAB'
Your branch is up to date with 'origin/openHAB'.
+ return 0
+ echo OK
OK
+ return 0
+ echo OK
OK
++ timestamp
++ date +%F_%T_%Z
+ echo -n '2021-08-15_00:20:14_CEST [openHABian] Setting up OverlayFS... '
2021-08-15_00:20:14_CEST [openHABian] Setting up OverlayFS... + cond_redirect make --always-make --directory=/opt/zram/zram-config/overlayfs-tools
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ make --always-make --directory=/opt/zram/zram-config/overlayfs-tools \033[39;49;00m'
$ make --always-make --directory=/opt/zram/zram-config/overlayfs-tools
+ make --always-make --directory=/opt/zram/zram-config/overlayfs-tools
make: Entering directory '/opt/zram/zram-config/overlayfs-tools'
gcc -Wall -std=c99 -c main.c
gcc -Wall -std=c99 -c logic.c
gcc -Wall -std=c99 -c sh.c
gcc -lm main.o logic.o sh.o -o overlay
make: Leaving directory '/opt/zram/zram-config/overlayfs-tools'
+ return 0
+ cond_redirect mkdir -p /usr/local/lib/zram-config/
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ mkdir -p /usr/local/lib/zram-config/ \033[39;49;00m'
$ mkdir -p /usr/local/lib/zram-config/
+ mkdir -p /usr/local/lib/zram-config/
+ return 0
+ cond_redirect install -m 755 /opt/zram/zram-config/overlayfs-tools/overlay /usr/local/lib/zram-config/overlay
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ install -m 755 /opt/zram/zram-config/overlayfs-tools/overlay /usr/local/lib/zram-config/overlay \033[39;49;00m'
$ install -m 755 /opt/zram/zram-config/overlayfs-tools/overlay /usr/local/lib/zram-config/overlay
+ install -m 755 /opt/zram/zram-config/overlayfs-tools/overlay /usr/local/lib/zram-config/overlay
+ return 0
+ echo OK
OK
++ timestamp
++ date +%F_%T_%Z
+ echo -n '2021-08-15_00:20:17_CEST [openHABian] Setting up zram... '
2021-08-15_00:20:17_CEST [openHABian] Setting up zram... + cond_redirect install -m 755 /opt/zram/zram-config/zram-config /usr/local/sbin
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ install -m 755 /opt/zram/zram-config/zram-config /usr/local/sbin \033[39;49;00m'
$ install -m 755 /opt/zram/zram-config/zram-config /usr/local/sbin
+ install -m 755 /opt/zram/zram-config/zram-config /usr/local/sbin
+ return 0
+ cond_redirect install -m 644 /opt/openhabian/includes/ztab /etc/ztab
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ install -m 644 /opt/openhabian/includes/ztab /etc/ztab \033[39;49;00m'
$ install -m 644 /opt/openhabian/includes/ztab /etc/ztab
+ install -m 644 /opt/openhabian/includes/ztab /etc/ztab
+ return 0
+ cond_redirect mkdir -p /usr/local/share/zram-config/log
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ mkdir -p /usr/local/share/zram-config/log \033[39;49;00m'
$ mkdir -p /usr/local/share/zram-config/log
+ mkdir -p /usr/local/share/zram-config/log
+ return 0
+ cond_redirect install -m 644 /opt/zram/zram-config/zram-config.logrotate /etc/logrotate.d/zram-config
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ install -m 644 /opt/zram/zram-config/zram-config.logrotate /etc/logrotate.d/zram-config \033[39;49;00m'
$ install -m 644 /opt/zram/zram-config/zram-config.logrotate /etc/logrotate.d/zram-config
+ install -m 644 /opt/zram/zram-config/zram-config.logrotate /etc/logrotate.d/zram-config
+ return 0
+ echo OK
OK
+ echo ReadWritePaths=/usr/local/share/zram-config/log
+ [[ -f /etc/systemd/system/find3server.service ]]
+ [[ -f /lib/systemd/system/influxdb.service ]]
+ openhab_is_installed
+ openhab2_is_installed
++ dpkg -s openhab2
++ grep Status
++ cut '-d ' -f2
+ [[ '' == \i\n\s\t\a\l\l ]]
+ return 1
+ openhab3_is_installed
++ dpkg -s openhab
++ grep Status
++ cut '-d ' -f2
+ [[ install == \i\n\s\t\a\l\l ]]
+ return 0
+ return 0
++ timestamp
++ date +%F_%T_%Z
+ echo -n '2021-08-15_00:20:18_CEST [openHABian] Setting up zram service... '
2021-08-15_00:20:18_CEST [openHABian] Setting up zram service... + cond_redirect install -m 644 /opt/zram/zram-config/zram-config.service /etc/systemd/system/zram-config.service
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ install -m 644 /opt/zram/zram-config/zram-config.service /etc/systemd/system/zram-config.service \033[39;49;00m'
$ install -m 644 /opt/zram/zram-config/zram-config.service /etc/systemd/system/zram-config.service
+ install -m 644 /opt/zram/zram-config/zram-config.service /etc/systemd/system/zram-config.service
+ return 0
+ cond_redirect systemctl -q daemon-reload
+ running_in_docker
+ [[ -n '' ]]
+ grep -qs 'docker\|lxc' /proc/1/cgroup
+ [[ -f /.dockerenv ]]
+ return 1
+ running_on_github
+ [[ -n '' ]]
+ return 1
+ cond_redirect systemctl mask unattended-upgrades.service
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ systemctl mask unattended-upgrades.service \033[39;49;00m'
$ systemctl mask unattended-upgrades.service
+ systemctl mask unattended-upgrades.service
Created symlink /etc/systemd/system/unattended-upgrades.service → /dev/null.
+ return 0
+ cond_redirect systemctl enable --now zram-config.service
+ [[ -n '' ]]
+ echo -e '\n\033[90;01m$ systemctl enable --now zram-config.service \033[39;49;00m'
$ systemctl enable --now zram-config.service
+ systemctl enable --now zram-config.service
Created symlink /etc/systemd/system/basic.target.wants/zram-config.service → /etc/systemd/system/zram-config.service.
+ return 0
+ echo OK
OK
+ return 0
No that did update it, you were comparing to the main branch not the openHAB branch of zram-config.
I never tested on 1.6.5 only on the dev image.
Ok so it was updated correctly. Where else to look after then ? Please try reproducing yourself with 1.6.5. I used it as that is what people out there use.
I'll test 1.6.5 and report back.
@mstormi OK, I think it is fixed for good now. I've tested multiple times on different flashes.
I found and fixed one more bug in the zram-config code that was causing the weird behavior when adding a device while zram was already running.
Please test again on your end and hopefully report back with good news.
Ok, that bug-on-adding would be an explanation why it happened so all of a sudden (or rather went unspotted for so long). Or did you change anything lately before those reports started?
Yes LGTM now. Thanks for taking care.
Ok, that bug-on-adding would be an explanation why it happened so all of a sudden (or rather went unspotted for so long). Or did you change anything lately before those reports started?
I am not aware of a change that I made that would have caused that recently. It was just a very specific set of circumstances that it would happen under.
See https://community.openhab.org/t/oh3-1-fresh-installation-issues-mosquitto-grafana-permissions-how-to-check-if-rest-is-ok/125334/8
Indeed I see this on a fresh install:
and