openenergymonitor / emonpi

Raspberry Pi Based Energy Monitor. Hardware, Firmware & related software for the PI.
https://guide.openenergymonitor.org/setup
270 stars 113 forks source link

Python systemd servicerunner #66

Closed glynhudson closed 6 years ago

glynhudson commented 6 years ago

Continue development from

https://github.com/openenergymonitor/emonpi/pull/65

glynhudson commented 6 years ago

Would anyone like to test and review @Paul-Reed @borpin ?

See the installed instructions readme

glynhudson commented 6 years ago

Top is showing a redis load of about 5%. Wondering if we should increase the sleep to 1s? What do you think @greeebs ?

https://github.com/openenergymonitor/emonpi/pull/66/files#diff-dcab05d939ed068946bc6f7a0154b33eR42

greeebs commented 6 years ago

I just rebooted my pi (using the beta image) and it didn't all start cleanly... Looks like redis-server started and then crashed and the dependency in the unit file wasn't enough to help.

It's way past midnight here so I'll take another look tomorrow at why it isn't starting cleanly. Might need more dependencies added.

greeebs commented 6 years ago

Top is showing a redis load of about 5%. Wondering if we should increase the sleep to 1s? What do you think @greeebs ?

Hmm... its 0.3% on my Pi3...

With that said though, the only thing that sleep time impacts is the speed at which the log window starts displaying output after you click the "emon[Pi|Base] Update" button. 1s is probably a perfectly acceptable delay for that!

borpin commented 6 years ago

@glynhudson

Would anyone like to test and review @Paul-Reed @borpin ?

I'm running a cutdown test emoncms version on a VM ATM. What functionality actually uses the feed runner? If so I'll give it a go.

borpin commented 6 years ago

I did some digging on systemd dependencies here when looking at mosquitto startup problems. The reference included in that post, suggests that an After statement does not mean it will wait until 'after' the other service (which is a bit daft IMO).

I think therefore you also need a Requires as "This directive lists any units upon which this unit essentially depends."

borpin commented 6 years ago

Another point of note. This is in the emonpi repository. As it is not specific to the emonpi (the install / update may be), should it be in the main repo.

borpin commented 6 years ago

Another thought, does the location of the PID & log files need to be specified on a RW partition?

Paul-Reed commented 6 years ago

Would anyone like to test and review @Paul-Reed @ borpin ?

I don't have a emonpi, and on my self-build system I use my own (node-red scripted) backup system, so I wouldn't really be able to fully test.

Paul

greeebs commented 6 years ago

@borpin

I'm running a cutdown test version ATM. What functionality actually uses the feed runner? If so I'll give it a go.

service-runner triggers the update shell script when you click on the "emon[Pi|Base] Update" button on the Administration page of emoncms. It seems like there may be other cases where it is used as well but I'm not sure what they are.

Another thought, does the location of the PID & log files need to be specified on a RW partition?

PIDFile= is only required if the service uses the SysV init style of forking, otherwise it is handled by systemd (as far as I can tell!). It currently logs to the journal which is on a tmpfs when used on the read-only filesystem

borpin commented 6 years ago

service-runner triggers the update shell script when you click on the "emon[Pi|Base] Update" button

I did a quick check and it seems a number of modules in the main Emoncms repo use the service-runner trigger.

@greeebs did you get it to start cleanly?

glynhudson commented 6 years ago

I've just tried running the systemd service on an old emonPi (with the RW file system).

I've symlinked in the service-runner.service file and I can successfully run / stop and view status using

sudo systemctl start service-runner.service
sudo systemctl status service-runner.service

However, if i try and enable the service to run at startup using

sudo systemctl enable service-runner.service

I get the message:

Failed to execute operation: No such file or directory

I've varified that /etc/systemd/system/service-runner.service does indeed exist. I wonder what this issue could be. I'll try and debug further when I'm back in the office next week.

Paul-Reed commented 6 years ago
sudo systemctl start service-runner.service
sudo systemctl status service-runner.service

Can't you just have;

sudo systemctl start service-runner
sudo systemctl status service-runner

Haven't tested, but I assume systemd is clever enough to recognise the command without needing the .service suffix.

Paul

Paul-Reed commented 6 years ago

Also, I'm not that familiar with systemd intricacies, but shouldn't service-runner.service be placed in /lib/systemd/system/ ? As it's being run as multi-user, running sudo systemctl enable service-runner should then automatically create a symlink in /etc/systemd/system/multi-user.target.wants/ which then starts the service as part of systemd. The symlink that's created should resemble; service-runner.service -> /lib/systemd/system/service-runner.service

I'm not in a position to test, but this is my thoughts.

greeebs commented 6 years ago

Thanks @Paul-Reed, I've done some more reading and you're correct, the unit files should go into /lib/systemd/system. The way to find out where they should get put is to run pkg-config systemd --variable=systemdsystemunitdir. Also see the manpage for systemd.unit(5). In fact, if I had the symlinked file into /etc/systemd/system from /home/pi/emonpi and I run this:

pi@emonpi(rw):~$ sudo systemctl disable service-runner
Removed /etc/systemd/system/service-runner.service.
Removed /etc/systemd/system/multi-user.target.wants/service-runner.service.

My symlink gets removed!

so I tried testing Paul's suggestion and this is what I get:

pi@emonpi(rw):emonpi$ sudo ln -s /home/pi/emonpi/service-runner.service /lib/systemd/system
pi@emonpi(rw):emonpi$ sudo systemctl enable service-runner
Created symlink /etc/systemd/system/multi-user.target.wants/service-runner.service → /home/pi/emonpi/service-runner.service.
Created symlink /etc/systemd/system/service-runner.service → /home/pi/emonpi/service-runner.service.
pi@emonpi(rw):emonpi$ sudo systemctl disable service-runner
Removed /etc/systemd/system/service-runner.service.
Removed /etc/systemd/system/multi-user.target.wants/service-runner.service.
pi@emonpi(rw):emonpi$ sudo systemctl enable service-runner
Created symlink /etc/systemd/system/multi-user.target.wants/service-runner.service → /home/pi/emonpi/service-runner.service.
Created symlink /etc/systemd/system/service-runner.service → /home/pi/emonpi/service-runner.service.

Which looks like exactly what we want.

Haven't tested, but I assume systemd is clever enough to recognise the command without needing the .service suffix.

That is indeed correct and confirmed on my image.

@borpin I've been reading through the intricacies of After= etc. and I think we actually want BindsTo= as well as After=... or possibly even PartOf=

I tested both BindsTo and PartOf and both of them make it worse :) At the moment on this image, redis-server fails to start for the first few minutes (trying to start and then failing) because there is no redis directory in /var/log. Something triggers the creation of the various directories under /var/log after which redis-server finally starts. Both BindsTo and PartOf causes the service-runner service to hit the restart-limit because redis-server is constantly trying to start.

I ended up changing service-runner.py so that it waits for redis to be available (trying every 5 seconds) instead of just crashing. It all seems to be working now.

greeebs commented 6 years ago

I've tried to push a commit to the python-systemd-servicerunner branch here but unsurprisingly I don't have access. @glynhudson, I've attached the updated versions here... updated_service-runner_files.zip

greeebs commented 6 years ago

@borpin posed this question which doesn't seem to have been addressed:

Another point of note. This is in the emonpi repository. As it is not specific to the emonpi (the install / update may be), should it be in the main repo.

As it seems like it is used by a number of Modules within EmonCMS, it does feel like its in the wrong place being here... @glynhudson , @TrystanLea, your thoughts?

pb66 commented 6 years ago

@borpin posed this question which doesn't seem to have been addressed:

Another point of note. This is in the emonpi repository. As it is not specific to the emonpi (the install / update may be), should it be in the main repo.

As it seems like it is used by a number of Modules within EmonCMS, it does feel like its in the wrong place being here... @glynhudson , @TrystanLea, your thoughts?

Absolutely, I've suggested this a few times too, more recently, when the change to Redis was announced and then again when it cropped up as an issue more than once when the sync module use started to increase.

At that time @TrystanLea did agree that the emoncms repo was the sensible place for it.

Paul-Reed commented 6 years ago

@borpin posed this question which doesn't seem to have been addressed:

Another point of note. This is in the emonpi repository. As it is not specific to the emonpi (the install / update may be), should it be in the main repo.

As it seems like it is used by a number of Modules within EmonCMS, it does feel like its in the wrong place being here... @glynhudson , @TrystanLea, your thoughts?

Yes agree. Definitely should be in the main emoncms repo. Had it been in there, I would have been more inclined load it up, test & develop, instead of just commenting on the code by reading through it. It should be in the scripts folder, sitting alongside the mqtt_input.service mqtt service file.

Paul

borpin commented 6 years ago

@greeebs - some additional thoughts which might (or might not) help :smile:

I tested both BindsTo and PartOf and both of them make it worse :) At the moment on this image, redis-server fails to start for the first few minutes (trying to start and then failing) because there is no redis directory in /var/log.

There was a solution proposed, that I repeated here https://community.openenergymonitor.org/t/mosquitto-startup-issues/7386 during a similar startup problem with mosquitto.

Did you try the requires= statement?

I also found elsewhere a possible need for the directive After=syslog.target (and possibly a requires as well) on a guess that it is syslog that creates the log directories? If not it may be a problem with logrotate?

More digging on the related redis not starting issue .https://serverfault.com/questions/893066/redis-not-starting-with-systemctl - is there a redis.cof entry that needs to be updated?

greeebs commented 6 years ago

@borpin - yes, I tried Requires=, but while reading the man pages though it became clear that BindsTo= and Partof= are basically more stringent Requires= directives... If redis-server started properly, I could have used PartOf= which would cause service-runner to be stopped if you ever stopped redis-server (for example) - making it much neater. As it is, I ended up using Wants=

I think the issue with redis-server starting up is to do with the "read-only" EmonSD image needing to create /var/log on a tmpfs every boot and then populating it with required directories. Thanks for the ideas on logrotate and redis.conf, I'll have a bit of a poke around over the next day or so to see if I can figure it out.

Either way, the service-runner service will work as it is in the zipped versions in my https://github.com/openenergymonitor/emonpi/pull/66#issuecomment-416117888 above.

I have just realised I will have to revisit the python script again to handle the case of redis-server being stopped after service-runner has started. At the moment that will break it until service-runner is restarted manually.

greeebs commented 6 years ago

Thanks for the pointer to that mosquitto thread @borpin, that pointed me to rc.local which is where the log directories get created:

if ( mount | grep "on /var/log "| grep -q "^tmpfs " )
then
  for i in "redis" "apache2" "mysql" "openhab" "logrotate" "mosquitto" "supervisor"; do mkdir /var/log/"$i"; done
...

  # Start / Restart services,they should run happy now log dir's are created
  sleep 3
  service mysql restart
  service redis-server restart
...

So it is definitely due to the "low-write" f/s changes required for the SD Card. I think I'll just stick to handling a "missing" (not yet started or subsequently stopped/restarted) redis-server within the service-runner.py script as it seems the tidiest solution.

greeebs commented 6 years ago

I've updated the python script so that handles a loss of connection to redis-server gracefully. This is what the output looks like immediately after booting up, followed by stopping redis-server and restarting it shortly afterwards:

pi@emonpi(ro):~$ sudo journalctl -u service-runner
-- Logs begin at Tue 2018-08-28 01:13:00 UTC, end at Tue 2018-08-28 01:20:23 UTC. --
Aug 28 01:13:14 emonpi systemd[1]: Started Emoncms service-runner Input Script.
Aug 28 01:13:20 emonpi service-runner[748]: Starting service-runner
Aug 28 01:13:20 emonpi service-runner[748]: Unable to connect to redis-server, sleeping for 30s
Aug 28 01:13:50 emonpi service-runner[748]: Unable to connect to redis-server, sleeping for 30s
Aug 28 01:14:20 emonpi service-runner[748]: Unable to connect to redis-server, sleeping for 30s
Aug 28 01:14:57 emonpi service-runner[748]: Connected to redis-server
pi@emonpi(ro):~$ sudo systemctl stop redis-server
pi@emonpi(ro):~$ sudo journalctl -u service-runner
-- Logs begin at Tue 2018-08-28 01:13:00 UTC, end at Tue 2018-08-28 01:20:56 UTC. --
Aug 28 01:13:14 emonpi systemd[1]: Started Emoncms service-runner Input Script.
Aug 28 01:13:20 emonpi service-runner[748]: Starting service-runner
Aug 28 01:13:20 emonpi service-runner[748]: Unable to connect to redis-server, sleeping for 30s
Aug 28 01:13:50 emonpi service-runner[748]: Unable to connect to redis-server, sleeping for 30s
Aug 28 01:14:20 emonpi service-runner[748]: Unable to connect to redis-server, sleeping for 30s
Aug 28 01:14:57 emonpi service-runner[748]: Connected to redis-server
Aug 28 01:20:44 emonpi service-runner[748]: Connection to redis-server lost, attempting to reconnec
Aug 28 01:20:44 emonpi service-runner[748]: Unable to connect to redis-server, sleeping for 30s
pi@emonpi(ro):~$ sudo systemctl start redis-server
pi@emonpi(ro):~$ sudo journalctl -u service-runner
-- Logs begin at Tue 2018-08-28 01:13:00 UTC, end at Tue 2018-08-28 01:23:13 UTC. --
Aug 28 01:13:14 emonpi systemd[1]: Started Emoncms service-runner Input Script.
Aug 28 01:13:20 emonpi service-runner[748]: Starting service-runner
Aug 28 01:13:20 emonpi service-runner[748]: Unable to connect to redis-server, sleeping for 30s
Aug 28 01:13:50 emonpi service-runner[748]: Unable to connect to redis-server, sleeping for 30s
Aug 28 01:14:20 emonpi service-runner[748]: Unable to connect to redis-server, sleeping for 30s
Aug 28 01:14:57 emonpi service-runner[748]: Connected to redis-server
Aug 28 01:20:44 emonpi service-runner[748]: Connection to redis-server lost, attempting to reconnec
Aug 28 01:20:44 emonpi service-runner[748]: Unable to connect to redis-server, sleeping for 30s
Aug 28 01:21:14 emonpi service-runner[748]: Connected to redis-server
pi@emonpi(ro):~$

I'll wait until @glynhudson has responded on the best way to get my changes into this PR before I post the latest version somewhere.

I've also added 2>&1 to the end of the cmdline that is read in from redis so that both stdout and sterr go into the provided logfile. Previously stderr was going into the journal which wasn't very helpful (or tidy!)

greeebs commented 6 years ago

Does anyone know why this touch was in the original bash script? https://github.com/openenergymonitor/emonpi/blob/37a2e1c173cc0f9966be170c9c675f892c568022/service-runner#L25-L27 It seems that if the cmdline ran, it would have created an empty log file anyway, and if it couldn't do that for whatever reason, touching that logfile would make it look like the script ran correctly with no output, giving an incorrect indication of success. My python version also includes this functionality but I wonder if it should.

greeebs commented 6 years ago

@glynhudson, I just submitted a PR (#67) to the python-systemd-servicerunner branch with my changes... I figure it was the easiest way to submit those changes I've described above, assuming you can merge that PR in first and then merge this one into master when its considered "ready".