Add support for zones - Githubissues

Copis commented 4 years ago

Is your feature request related to a problem? Please describe. We have a master zone and some satellite zones behind a vpn or firewall. In that cases the master couldn't receive traps.

Describe the solution you'd like Whould be great to be abble to receibe these snmp traps in one satellite endpoing and sent the status to master

Describe alternatives you've considered Forward snmp tramps from satellite to master

patrickpr commented 4 years ago

Hi,

It's a good feature, I will start working on it for next version.

Are you able to test this (my lab environment does not include master/sattellite setting) ?

robdevops commented 4 years ago

I am about to do a multi-zone build, and will be able to test this in coming days/weeks.

Copis commented 4 years ago

I can test this scenario in my developement environment with one master/one statellite but i think should be better to test into ha environment with two masters/two satellites if it's possible.

patrickpr commented 4 years ago

For update : I'm currently building the test environement for this.

patrickpr commented 4 years ago

@Copis : architecture of satellites is work in progress.

Test environment : two masters in HA and two satellites in HA.

Traps can be received by :

master ( if there is a HA master using VRRP (keealived) IP)
satellite (if there is a HA sat, using VRRP too).

Satellite receives and process traps using configuration provided by masters and :

update database using a simple API provided by trapdirector module on masters.
Send passive service check results to satellites (or to master, this isn't decided yet).

For now, there is no zone for trap rules : they are global.

I assume :

1) satellites can have access to master (and masterHA) on :

Icinga API port (5665 by default)
Icingaweb2 HTTP port (443) (Satellites will use a specific Icingaweb2 user)

2) Master and master HA both have access to the trapdirector database.

3) Latency between master(s) and sat(s) is low (<500ms)

I'm opened to comments and suggestions !

Copis commented 4 years ago

One of the problems that i see is in some scenarios cannot have VRRP for example in Active-Passive or Active-Active CPD with no extended vlans. In that case there are no posible implementation

patrickpr commented 4 years ago

Opened a topic here to talk about it : https://community.icinga.com/t/trapdirector-ha-feature/5439

p4k8 commented 4 years ago

So here are some thoughts about it:

As long as all instances of trap director talk to the same DB, it shouldn't matter how many there are.
Traps can be forwarded from any nodes they can be received to any snmptrapd on trapdirector nodes. This enables chaining them through firewalls to the nodes where they can be processed properly.
When trapdirector processes trap, it sends result to API of satellite/master. Why not both in a configurable order? So if you send result to satellite and you don't like return or its unreachable, you resend it to master or another satellite.
In this scenario you'd have to worry about deduplication of traps if you choose to do HA by trying to send traps to all existing trapdirector instances which don't know about each other but share DB. Maybe theres even some cheap way to discard duplicates which is better than DB lookup for last 5 seconds worth of traps to see if it was already processed by fellow trapdirectors.

patrickpr commented 4 years ago

1. As long as all instances of trap director talk to the same DB, it shouldn't matter how many there are.

Correct, but DB connexion may be impossible on distant sites.

2. Traps can be forwarded from any nodes they _can_ be received to any snmptrapd on trapdirector nodes. This enables chaining them through firewalls to the nodes where they can be processed properly.

Some kind of trap routing ? Not very easy to implement !!!

3. When trapdirector processes trap, it sends result to API of satellite/master. Why not both in a configurable order? So if you send result to satellite and you don't like return or its unreachable, you resend it to master or another satellite.

Yes : satellite then master or master only (maybe set this by zones ?)

4. In this scenario you'd have to worry about deduplication of traps if you choose to do HA by trying to send traps to all existing trapdirector instances which don't know about each other but share DB. Maybe theres even some cheap way to discard duplicates which is better than DB lookup for last 5 seconds worth of traps to see if it was already processed by fellow trapdirectors.

There is a special 'waiting' status in DB that was implemented for this kind of things.

p4k8 commented 4 years ago

DB connexion may be impossible on distant sites

So thats why it might be sound idea not to make any trapdirectors on distant sites. Like DB <--> trapdirector <--snmptrapd on trapdirector host <-- firewalls/networks/whatever <-- snmptrapd with forward directive on remote site "HA" in this part is achieved by forwarding traps from remote host to several trapdirector destinations simultaneously and then each of the trapdirectors would have list of API endpoints to send check result to. So that would mean getting trap at least once, and maximum as many as there are snmptrapd forward destinations. That's solved by deduping stuff I guess.

Some kind of trap routing

More like, just adding forward default <address> to snmptrapd.conf pointing at snmptrapd on proper trapdirector node.

maybe set this by zones

Not sure if it actually has to be zone-aware to work properly as long as the endpoint addresses are listed in the correct order.

patrickpr / trapdirector

Add support for zones #32