unioslo / zabbix-auto-config

MIT License
3 stars 6 forks source link

If a modifier fails the host should be discarded #58

Open paalbra opened 1 year ago

paalbra commented 1 year ago

If a modifier fails, the processing will just continue without those modifications:

https://github.com/unioslo/zabbix-auto-config/blob/2b45f1cb7da0d46b8b218005ebbf751cb17f8793/zabbix_auto_config/processing.py#L302-L311

This could lead to a weird state/unknown problems (very dependent on the modifier, obviously).

I think that the current host, that is being modified, should be discarded from the current update run (kind of like a failing collector), since we do not know what happened/what it really should look like. This would mean: Keep the host as is in the database.

This issue is also mentioned in #57

paalbra commented 1 year ago

Example by changing what is in the current README:

https://github.com/unioslo/zabbix-auto-config/blob/2b45f1cb7da0d46b8b218005ebbf751cb17f8793/README.md?plain=1#L84-L89

Just add something like this to the modify function:

if random.randint(0, 1):
    raise Exception("FAILED "*50)

Then you will see Template-barry having a 50% chance of being linked to bar.example.com. It would be better to discard the host 50% of the time, and always keep the linked template.

pederhan commented 1 year ago

What if there is a syntax error or something that causes the modifer to fail 100% of the time? You would effectively remove all hosts, wouldn't you? Wouldn't this cause a non-trivial strain on the database from constantly adding and removing hosts, while also never actually recovering from the error? Or do you just mean that the host modifier process itself should just stop applying modifiers to that host, and otherwise continue?

Either way, if a failing modifier should cause a host to be discarded, I think there also needs to be some way to detect a situation in which a modifier fails repeatedly: see my comment on #57. Although that comment was related to source collectors, I think we can apply the same concept to host modifiers.

paalbra commented 1 year ago

Oh, I might not have been clear.

By "discarded" I don't at all mean delete. I just mean that it should be discarded from the current update run. E.g., in my example, the host bar.example.com should be kept as is, and not loose the barry property, and therefore not lose the Template-barry.

This would mean not updating the database with the failed modification.

EDIT: I tried to edit the issue post to be more precise.

paalbra commented 1 year ago

And if something fails 100% of the time I still think it is fair to discard all updates. You should seriously look into something that fails 100% of the time. That's no good.

pederhan commented 1 year ago

Yes, but your suggestion does not provide a failure path for bad modifiers. Yes, it prevents unpredictable state stemming from a modifier that fails, but the problematic modifier itself will never be removed, and it will generate potentially thousands of log entries per minute as long as it's active.

paalbra commented 1 year ago

That's true. They will just keep on failing.

This is mostly a hypothetical worst case scenario though. Generally this shouldn't happen, but if it does I think discarding is the best solution.

I don't see thousands of log entries as any issue if you compare it to erroneously modifying thousands of hosts, won't you agree?