Puppet attempts to restart running varnish service

kwisatz commented 7 years ago

OS: Debian 8.9 Puppet: 3.8.7

After an upgrade of various puppet modules, puppet is now trying to restart the varnish service although it is already running. This only happens for the varnish service, so I'm pretty sure it's related to this module, not puppet in general:

Aug 08 16:49:20 loadbalancer puppet-agent[16858]: Could not start Service[varnish]: Execution of '/etc/init.d/varnish start' returned 1: Starting HTTP accelerator: varnishd failed!
Aug 08 16:49:20 loadbalancer puppet-agent[16858]: bind(): Address already in use
Aug 08 16:49:20 loadbalancer puppet-agent[16858]: bind(): Address already in use
Aug 08 16:49:20 loadbalancer puppet-agent[16858]: Error: Failed to open (any) accept sockets.
Aug 08 16:49:20 loadbalancer puppet-agent[16858]: Wrapped exception:
Aug 08 16:49:20 loadbalancer puppet-agent[16858]: Execution of '/etc/init.d/varnish start' returned 1: Starting HTTP accelerator: varnishd failed!
Aug 08 16:49:20 loadbalancer puppet-agent[16858]: bind(): Address already in use
Aug 08 16:49:20 loadbalancer puppet-agent[16858]: bind(): Address already in use
Aug 08 16:49:20 loadbalancer puppet-agent[16858]: Error: Failed to open (any) accept sockets.
Aug 08 16:49:20 loadbalancer puppet-agent[16858]: (/Stage[main]/Varnish::Service/Service[varnish]/ensure) change from stopped to running failed: Could not start Service[varnish]: Execution of '/etc/init.d/varnish start' returned 1: Starting HTTP accelerator: varnishd failed!
Aug 08 16:49:20 loadbalancer puppet-agent[16858]: (/Stage[main]/Varnish::Service/Service[varnish]/ensure) bind(): Address already in use
Aug 08 16:49:20 loadbalancer puppet-agent[16858]: (/Stage[main]/Varnish::Service/Service[varnish]/ensure) bind(): Address already in use
Aug 08 16:49:20 loadbalancer puppet-agent[16858]: (/Stage[main]/Varnish::Service/Service[varnish]/ensure) Error: Failed to open (any) accept sockets.

It incorrectly detects varnish as not running. Looking at the service output, we can see that it is running:

root@loadbalancer:~# service varnish status
● varnish.service - Varnish Cache, a high-performance HTTP accelerator
   Loaded: loaded (/etc/systemd/system/varnish.service; enabled)
   Active: active (running) since Tue 2017-08-08 13:08:16 CEST; 4h 9min ago
 Main PID: 30600 (varnishd)
   CGroup: /system.slice/varnish.service
           ├─30600 /usr/sbin/varnishd -a :6081 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,256m -p ht...
           └─30601 /usr/sbin/varnishd -a :6081 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/varnish/secret -s malloc,256m -p ht...

The most confusing part about this is that the error only occurs when puppet is run by the service, not when I manually run the agent.

Edit: I modified the DAEMON_OPTS to make puppet log output more verbose, and I discovered that it is indeed calling the status action on the init.d script:

Aug 08 17:48:12 loadbalancer puppet-agent[28735]: Executing '/etc/init.d/nginx status'
Aug 08 17:48:12 loadbalancer puppet-agent[28735]: Executing '/etc/init.d/varnish status'
Aug 08 17:48:12 loadbalancer puppet-agent[28735]: Executing '/etc/init.d/varnish start'
Aug 08 17:48:12 loadbalancer puppet-agent[28735]: Could not start Service[varnish]: Execution of '/etc/init.d/varnish start' returned 1: Starting HTTP accelerator: varnishd failed!

So it must be misinterpreting the result it sees from the status call…

I also made sure the status call actually exits with a code of 0, which it does and which is, according to the documentation of the service resource, what puppet is looking for.

cyberscribe commented 7 years ago

Have you checked the size of /var/lib/varnish? The module default of 128M was (legitimately) crashing varnish for me repeatedly. If yours is fine and running, then this is indeed just a signalling issue.

kwisatz commented 7 years ago

@cyberscribe I've checked, because we've had the issue of if filling up and preventing varnish from restarting in the past. But this is a completely different issue. Varnish is running just fine, it's just that somehow - and I just can't get to the bottom of this – the puppet service resource is unable to determine so from its checks.

It also seems that the status check (or rather, the ensure => running check) is only made when puppet runs from the periodic checks, not when run manually.

This happend on all machines which use varnish, so it's not just a glitch somewhere. And the varnish puppet module hasn't changed, so I really am dumbfounded.

But I posted this here so maybe anyone else with the same problem would find it and be able to share their findings.

cyberscribe commented 7 years ago

OK, definitely sounds like a separate/signalling issue. Just thought I'd check as the shmlog tmpfs filing up was a baffling one for me for awhile!

On 10 August 2017 at 11:50, David Raison notifications@github.com wrote:

@cyberscribe https://github.com/cyberscribe I've checked, because we've had the issue of if filling up and preventing varnish from restarting in the past. But this is a completely different issue. Varnish is running just fine, it's just that somehow - and I just can't get to the bottom of this – the puppet service resource is unable to determine so from its checks.

It also seems that the status check (or rather, the ensure => running check) is only made when puppet runs from the periodic checks, not when run manually.

This happend on all machines which use varnish, so it's not just a glitch somewhere. And the varnish puppet module hasn't changed, so I really am dumbfounded.

But I posted this here so maybe anyone else with the same problem would find it and be able to share their findings.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/maxchk/puppet-varnish/issues/145#issuecomment-321518223, or mute the thread https://github.com/notifications/unsubscribe-auth/AAqV5bO3m0O9HFNtegGmHrZ6wYEKXOEKks5sWuB6gaJpZM4Ow44z .

kwisatz commented 7 years ago

OK, I just figured out that it must be a problem with or bug in the puppet package distributed by puppetlabs. If I install 3.7.2 from the debian repos, it works. My guess is that the version in the debian repos has a different method of checking service status… but I haven't dug deeper yet.

root@loadbalancer:~# apt-get install puppet=3.7.2-4+deb8u1 puppet-common=3.7.2-4+deb8u1
[…]
The following NEW packages will be installed:
  ruby-safe-yaml
The following packages will be upgraded:
  puppet puppet-common
2 upgraded, 1 newly installed, 0 to remove and 47 not upgraded.
Need to get 0 B/1,053 kB of archives.
After this operation, 2,318 kB disk space will be freed.
Do you want to continue? [Y/n] 
[…]
Unpacking puppet (3.7.2-4+deb8u1) over (3.7.2-1puppetlabs1) ...
Preparing to unpack .../puppet-common_3.7.2-4+deb8u1_all.deb ...
Unpacking puppet-common (3.7.2-4+deb8u1) over (3.7.2-1puppetlabs1) ...

func0der commented 7 years ago

@kwisatz So this is not a bug of this module?

kwisatz commented 7 years ago

@func0der well, it only occurs on this module for me, but that might be related to the varnish service definition for systemd on debian, rather than this module's code.

I'm sorry I didn't have the time to dig deeper into this, e.g. checking what the differences are between the puppetlabs and debian versions of 3.7.2 release of puppet.

EmersonPrado commented 7 years ago

My 2 cent: varnish::service doesn't know you're using systemd. The module uses fact service_provider, from stdlib, which doesn't return 'systemd' for Debian 8, then populates variable varnish::params::systemd with true or false according to the fact.

Out of curiosity, I grabbed one of my Puppet test setups - the one with more distinct VMs - and tested the fact and variable above. They are indeed incorrect for Debian 8:

==> debian-82: Notice: Scope(Class[main]): Service provider: debian
==> debian-82: Notice: Scope(Class[main]): systemd: false
==> debian-75: Notice: Scope(Class[main]): Service provider: debian
==> debian-75: Notice: Scope(Class[main]): systemd: false
==> debian-60: Notice: Scope(Class[main]): Service provider: debian
==> debian-60: Notice: Scope(Class[main]): systemd: false
==> redhat-70: Notice: Scope(Class[main]): Service provider: systemd
==> redhat-70: Notice: Scope(Class[main]): systemd: true
==> redhat-65: Notice: Scope(Class[main]): Service provider: redhat
==> redhat-65: Notice: Scope(Class[main]): systemd: false
==> redhat-56: Notice: Scope(Class[main]): Service provider: redhat
==> redhat-56: Notice: Scope(Class[main]): systemd: false
==> centos-59: Notice: Scope(Class[main]): Service provider: redhat
==> centos-59: Notice: Scope(Class[main]): systemd: false

BTW, I am sure 'debian-82' VM uses systemd:

$ vagrant ssh -c 'ps -p 1' debian-82
  PID TTY          TIME CMD
    1 ?        00:00:00 systemd

So the issue is indeed in stdlib. We could workaround by creating a custom fact in varnish module, but I prefer fixing stuff where the problem is.

EmersonPrado commented 7 years ago

In fact, I'm not quite sure stdlib's service_provider custom fact should return the init system. The description just says it "Returns the default provider Puppet uses to manage services on this system". The code, mentioned in the previous message, doesn't show any intention related to knowing the init system. So the assertion about fixing stdlib might just be plain wrong.

I tried to find a module in Forge which detects the init system. As strange as it is to me, I found none. All modules that claim to do so either guess it from the distro and OS major version, or detected the installed init systems and guess the used one. So, the only option left is writing a custom fact (or a separate module for that).

EmersonPrado commented 7 years ago

Added PR 159 implementing the init system check with a more reliable custom fact.

maxchk / puppet-varnish

Puppet attempts to restart running varnish service #145