gridcentric / canary

OpenStack Nova performance stats collection framework (based on collectd).
http://docs.gridcentric.com/canary
12 stars 5 forks source link

Bug: Excessive bandwidth usage on viewing graphs for non-active instances #9

Open rui-lin opened 10 years ago

rui-lin commented 10 years ago

Both the libvirt plugin and vms-collectd-plugins don't (can't) collect data on non-active instances. The current behaviour is to treat the data at these time as null, and ignore these values (thus only the last valid data from when the instance was active is displayed).

The problem arises when instances have been shutdown for a long time. Since the canary rest api only supports querying for metrics with a "from_time", large amounts of tags such as [1377195290, "AVERAGE", null] are returned, in the order of a few MB per day the vm was shutdown. Requesting this per metric, per update interval (usually seconds) causes large amounts of network traffic, and with only a few graphs displayed, can easily reach gigabytes per hour. This bandwidth usage is excessive and slows down the page dramatically.

Proposed solutions:

EDIT:

This issue also exists in when viewing metrics that no longer get updated (eg. networks or disks that are deleted but still contain metrics from when they were around). Unlike shutdown instances which may be restarted, these devices may not come back, therefore displaying 0 may be misleading. Thus options boil down to:

amscanne commented 10 years ago

Perhaps the correct solution here is to filter the null entries from the backend, and have the frontend interpret missing entries appropriately?