mrkamel / heartbeat

Use Heartbeat to monitor your Hetzner Failover IP and automatically switch to another server.
53 stars 13 forks source link

[feature request] do periodic API checks #12

Closed tpo closed 3 years ago

tpo commented 3 years ago

I suggest to do an initial Hetzner Robot API check when heartbeat is started.

The reason is: let's assume that your VIP is pinging just fine. Now you will not want to find out that your API access is misconfigured only once the VIP is not pinging fine anymore and you should fail over.

So I suggest to do this:

  def monitor
    current_target # access the API as a check
    loop do
      res = check

      return if only_once

      res ? sleep(interval) : sleep(300)
    end
  end

The stronger version of this would be to periodically check the API, which I think I'd prefer and would propose to make the default behavior:


  def periodic_api_check
    # force API access if clock at noon
    current_target if @api_check_clock_position % api_check_clock_positions = 0
    # move the clock
    @api_check_clock_position = (@api_check_clock_position + 1) % api_check_clock_positions
  end

  def monitor
    loop do

      periodic_api_check
      res = check

      return if only_once

      res ? sleep(interval) : sleep(300)
    end
  end

With this behavior it'd probably be cleaner to move the periodic_api_check into the check method itself.

tpo commented 3 years ago

I retire this feauture request. Two reasons:

  1. The reason I want to have this feature is that I want to run heartbeat from within a pod in kubernetes. Now pods can die, nodes can get recycled and so the source IP addresses of where the Hetzner Robot API access is coming from might change. Which means that heartbeat might be merrily pinging the failover-IP however once it should trigger the IP switch it might find out that it has no access to the API due to kubernetes being dynamic and thus heartbeat not accessing the API from the API that was registered with Hetzner. This is what I want to prevent by checking upfront whether API access works. Now I can actually test that with running heartbeat once with something like this:
/heartbeat # cat config/heartbeat.invalid.yml 
base_url: https://robot-ws.your-server.de

basic_auth:
  username: valid_username
  password: valid_password
failover_ip: 0.0.0.0
ping_ip: 0.0.0.1 # invalid IP address!!!
ips:
  - ping: 1.1.1.1
    target: 1.1.1.1
  - ping: 2.2.2.2
    target: 2.2.2.2
interval: 1
timeout: 1
tries: 1
dry: true
only_once: true

and then `grep' heartbeats output for failure:

/heartbeat # HEARTBEAT_LOG=STDOUT bin/heartbeat --config config/heartbeat.invalid.yml | tee  /dev/fd/2 | grep "Unable to retrieve the active server ip" && echo && echo "ERROR: Can't access API!"
I, [2021-06-11T07:17:06.472422 #138]  INFO -- : Reading configuration from config/heartbeat.invalid.yml.
ping: sendto: Invalid argument
I, [2021-06-11T07:17:06.477612 #138]  INFO -- : ping 1/1 of 0.0.0.1 failed
I, [2021-06-11T07:17:07.474778 #138]  INFO -- : 0.0.0.1 is down
E, [2021-06-11T07:17:12.503123 #138] ERROR -- : Unable to retrieve the active server ip for 0.0.0.0 from https://robot-ws.your-server.de/failover/0.0.0.0
E, [2021-06-11T07:17:12.503333 #138] ERROR -- : Response from Hetzner Robot API was: 
I, [2021-06-11T07:17:12.503468 #138]  INFO -- : Not responsible for IP

ERROR: Can't access API!
  1. keep heartbeat simple and focused and not add an additional "Hetzner Robot API checker" purpose to it

Actually, the above config snippet could be added to into example/ and mentioned in the README.