intelsdi-x / snap

The open telemetry framework
http://snap-telemetry.io
Apache License 2.0
1.8k stars 295 forks source link

Utilizing notifications for OpenStack plugins #1341

Open dulek opened 7 years ago

dulek commented 7 years ago

(not a Snap expert here)

Currently OpenStack plugins are polling OpenStack APIs to get required metrics. Due to various limitations of APIs of different services, this may be critically inefficient and is putting unnecessary load on the API services. Instead of that strategy, the plugins should listen to notifications issued by OpenStack services and calculate metrics based on that. For examples see Nova's document [1].

[1] http://docs.openstack.org/developer/nova/notifications.html

mbbroberg commented 7 years ago

Great feedback @dulek. If I'm reading this correctly (I'm not a Nova user), we would need to consume events as they occur, which would require an event endpoint we don't yet have in Snap. The previous RFC is #136 but we've talked about redesigning it. I'll mark this as on-hold since it's blocked without this additional functionality.

dulek commented 7 years ago

@mjbrender: That's right, Nova (and other OpenStack services) is emitting notifications (i.e. VM lifecycle events) through AMQP. I'm not saying that everything Snap collects is emitted as notification, but it would be great to take a look and see if some stuff couldn't be moved to push model. For example being able to correctly modify current VMs metrics based on received notifications should significantly lower required number of 'nova list' calls.

And if something is Snap plugin's bottleneck and is missing from notifications - we can try to add it in OpenStack upstream. This probably raises questions of plugins being compatible with different OpenStack versions - as notifications are versioned much more loosely than REST APIs. I guess that's question for later.

lynxbat commented 7 years ago

And AMQP collector mapped to an OSLO processor would work great here /cc @jcooklin

mbbroberg commented 7 years ago

@dulek I had a good conversation with @jcooklin who said we could consider buffering events in the current collector model, assuming we think through persisting the queue between pushes to the endpoint. I don't love that idea but it could work. Sounds like @lynxbat has an idea too :). Thanks again for opening this one - we'll keep you posted 👌

jcooklin commented 7 years ago

Removing the 'on-hold' label since this can be done without requiring snap workflows to be triggered by events. In other words, it can be delivered with an interval schedule.