mdegat01 / addon-loki

Loki for Home Assistant
MIT License
10 stars 8 forks source link

Loki addon crashes after update #143

Closed mazzy89 closed 2 years ago

mazzy89 commented 2 years ago

Describe the bug After the recent update (1.9.2) Loki refuses to start

To Reproduce

Expected behavior A clear and concise description of what you expected to happen.

Logs

# docker logs 477a9e748e52 -f
[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] permissions: applying...
[cont-finish.d] executing container finish scripts...
[cont-finish.d] 99-message.sh: executing...
/var/run/s6/etc/cont-finish.d/99-message.sh: line 6: S6_STAGE2_EXITED: unbound variable
[cont-finish.d] 99-message.sh: exited 1.
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.

Environment (please complete the following information):

Additional context The Loki container CPU is high

CONTAINER ID   NAME                                        CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
a56d9128106d   addon_39bd2704_loki                         86.19%    2.324MiB / 7.628GiB   0.03%     45.8kB / 1.13kB   0B / 307kB        19
mdegat01 commented 2 years ago

@mazzy89 Hmm so I haven't seen that error and will take a look. Although could you check one thing for me, does your supervisor log have anything like this:

WARNING (MainThread) [supervisor.misc.tasks] Watchdog missing application response from 39bd2704_loki

When I updated I found that startup was taking too long. So long in fact that watchdog was killing it before it was able to get up and running since it pings it every 2 minutes.

I realized this was because my log index was enormous because apparently something about the Loki v2.4.0 update broke the retention policy I put in place (the days_to_keep setting in the addon). There was something like 2.2GB of logs for me. I'm not entirely sure what happened but switching to Loki's new compactor for handling retention seems to have fixed it in the beta build. I was planning to push that out today if I encountered no new issues.

This may not be your issue but I figured it would be good to ask since it happened to me.

mdegat01 commented 2 years ago

Ok correction, I do see that "unbound variable" error in my logs from yesterday when watchdog was harrassing Loki's startup. I think you might be having the same issue? If so the solution for now is to turn off watchdog but I do have an update that I can push out now which should put the retention policy back in place. It should clean up the size within a few hours. Let me know if you're seeing the same thing.

mazzy89 commented 2 years ago

Hei @mdegat01 sorry for the late reply.

watchdog in my case is indeed activated. I'm going to disable and try to run Loki again and see if this helped.

mazzy89 commented 2 years ago

I confirm here that removing the watchdog helped here and loki is again up

mdegat01 commented 2 years ago

@mazzy89 Great! So if your situation was like mine then after leaving it running for 6 hours or so it should clean up the overly large log index and then you should be able to turn watchdog back on after that. Although if you have days_to_keep set really high then the index may still be so large it takes more then 2 minutes to start even after cleanup, in which case you will have to leave watchdog off.

Sorry about that, I have minimal say in how watchdog works. I tell it what URL to ping for status for this particular application but there's no way to control the interval or tell it when startup is occurring currently.

github-actions[bot] commented 2 years ago

There hasn't been any activity on this issue recently, so we clean up some of the older and inactive issues. Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by leaving a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thanks!