megaease / easeprobe

A simple, standalone, and lightweight tool that can do health/status checking, written in Go.
Apache License 2.0
2.16k stars 228 forks source link

Notify with failure parameter more than 1. #523

Closed paulcynic closed 4 months ago

paulcynic commented 4 months ago

Environment (please complete the following information):

Describe the bug I have such config for my prober

tcp:
  - name: my python server
    host: 127.0.0.1:8001

settings:
  pid: /tmp/easeprobe.pid
  name: MyService
  timeformat: "2004-01-01 00:00:00"
  sla:
    data: "-"
  probe:
    timeout: 2s
    interval: 15s
    failure: 2
    success: 1

notify:
  teams:
    - name: "Teams notify"
      webhook: "http://localhost:8000/notify"

Pay attention to settings.probe.failure parameter. When failure=1 (as default value), all goes well. But if I change this parameter to 2 (3, 4... or other) Notifications are not sent. some example of logs

INFO[2024-05-21T17:18:45+03:00] Channel: __EaseProbe_Channel__
INFO[2024-05-21T17:18:45+03:00]    Probers:
INFO[2024-05-21T17:18:45+03:00]     - tcp: my python server
INFO[2024-05-21T17:18:45+03:00]    Notifiers:
INFO[2024-05-21T17:18:45+03:00]      - teams: Teams notify
INFO[2024-05-21T17:18:45+03:00] Ready to monitor(tcp): my python server - 127.0.0.1:8001
INFO[2024-05-21T17:18:45+03:00] Scheduling daily SLA reports at 00:00 UTC time...
INFO[2024-05-21T17:18:45+03:00] The SLA report will be schedule at 22000-05-05 00:00:00
ERRO[2024-05-21T17:18:45+03:00] [tcp / my python server] error: dial tcp 127.0.0.1:8001: connect: connection refused
INFO[2024-05-21T17:18:45+03:00] [tcp / my python server] - Status unchanged [init]! Threshold is not reached for failure [1/2].
ERRO[2024-05-21T17:19:00+03:00] [tcp / my python server] error: dial tcp 127.0.0.1:8001: connect: connection refused
INFO[2024-05-21T17:19:00+03:00] [tcp / my python server] - Status is DOWN! Threshold reached for failure [2/2]
ERRO[2024-05-21T17:19:15+03:00] [tcp / my python server] error: dial tcp 127.0.0.1:8001: connect: connection refused

and that's all, "Threshold reached for failure [2/2]", but notification not sent

To Reproduce Steps to reproduce the behavior:

  1. Go to 'easeprobe/cmd'
  2. Create config.yaml like in my case
  3. Run command go run ./easeprobe -f config.yaml
  4. See error

Expected behavior What I expect?

  1. Change in the config.yaml settings.probe.failure: 1
  2. Run prober
  3. Get logs like this
    INFO[2024-05-21T17:36:17+03:00] Channel: __EaseProbe_Channel__
    INFO[2024-05-21T17:36:17+03:00]    Probers:
    INFO[2024-05-21T17:36:17+03:00]     - tcp: my python server
    INFO[2024-05-21T17:36:17+03:00]    Notifiers:
    INFO[2024-05-21T17:36:17+03:00]      - teams: Teams notify
    INFO[2024-05-21T17:36:17+03:00] Ready to monitor(tcp): my python server - 127.0.0.1:8001
    INFO[2024-05-21T17:36:17+03:00] Scheduling daily SLA reports at 00:00 UTC time...
    INFO[2024-05-21T17:36:17+03:00] The SLA report will be schedule at 22005-05-05 00:00:00
    ERRO[2024-05-21T17:36:17+03:00] [tcp / my python server] error: dial tcp 127.0.0.1:8001: connect: connection refused
    INFO[2024-05-21T17:36:17+03:00] [tcp / my python server] - Status is DOWN! Threshold reached for failure [1/1]
    INFO[2024-05-21T17:36:17+03:00] [channel / __EaseProbe_Channel__]: my python server (127.0.0.1:8001) - Status changed [init] ==> [down], sending notification...
    INFO[2024-05-21T17:36:17+03:00] [teams / Teams notify / Notification] - my python server Failure - successfully sent!

    I want to get notification successfully sent! not only with failure = 1, but with other values too (for example 2, 3, 4...)

suchen-sci commented 4 months ago

Hi, thanks for your report, we will try to fix this as soon as possible

paulcynic commented 4 months ago

Oh!.. Too fast. I just checked this merge, all goes right. Good job, thanks!