vapor-ware / synse-server

An HTTP API for monitoring and controlling physical and virtual devices
https://synse.readthedocs.io/en/latest/server/intro/
GNU General Public License v3.0
39 stars 11 forks source link

A new approach for plugin refresh #371

Closed edaniszewski closed 4 years ago

edaniszewski commented 4 years ago

The current approach to plugin refresh is fairly naive. Synse notes that a plugin failed a request and marks it inactive. Later (either on a timer or when forced by user), it will attempt to reconnect/re-issue a command to the plugin to re-establish that it is "active".

I believe the premise of using active/inactive is still useful, as it does help to keep response times low for various requests, but the approach for having plugins be refreshed seems like it could use improvement.

My idea is that instead of having a retry all plugins at once on interval, retry them individually with backoff in a background task. Some details on this approach:

This relates to:

lazypower commented 4 years ago

I think this is a good first pass at resolving the issue. Collecting the additional data would also give us metrics to key off of for range based queries/alerts such as multiple plugins disconnecting in < 5m, or disconnects/reconnects inflating. Things like that, which would prompt us to take a look.

As it stands right now, its very much a silent failure and relies on humans to notice the plugin misbehaving. Step one feels like adding observability to it, and then going from there with a more informed decision by the data.

MatthewHink commented 4 years ago

This makes sense to me. Perhaps consider a hard upper limit on exponential backoff since exponential can grow rapidly.

edaniszewski commented 4 years ago

For sure.

I'll try to get started on this today. It is a fairly sizable change, so it'll take a few days probably, but I think in the end it'll be worth it both for performance and observability.