xapi-project / xen-api

The Xapi Project's XenAPI Server
http://xenproject.org/developers/teams/xapi.html
Other
346 stars 284 forks source link

guest_metrics_last_updated not updated correctly #4237

Open olivierlambert opened 4 years ago

olivierlambert commented 4 years ago

Hello everyone,

Here is a report on something that might be interesting.

Context :bulb:

I was investigating the possibility to do VMware like app/VM watchdog capabilities. With Xen Orchestra, we could use some XAPI values to make actions if needed. Because we are able to replicate VMs, we could decide to start some of them if the original VM is frozen or dead (remember XO is "on top of XAPI", meaning we can make decisions globally on multiple pool, which is great to leverage your whole XAPI enabled host infrastructure).

Thanks to the XAPI doc (I will never emphasis that enough: this is our bible here! Thanks for it! :+1: ) I found interesting fields in the VM_guest_metrics class:

Perfect tool for the job! :smile:

The problem :bug:

So I started to investigate when live would become false, by freezing the VM (eg with a simple xl pause $DOMID). Weirdly enough, nothing happened. It was always at true.

But that wasn't all: last_updated returned an old value (few minutes, hours or days before for some VMs), not something it's advertise to do. So I decided to compare directly by reading xenstore myself.

So for example:

# xenstore-ls /local/domain/<dom ID> | grep updated
 updated = "Thu Oct 22 19:44:54 2020"

I did the command multiple times, and obviously, the value was updated around every minutes (which is the expected thing) :+1:

But in XAPI, nothing changed:

# xe vm-param-get uuid=<VM UUID> param-name=guest-metrics-last-updated 
20201022T15:43:24Z

I double/triple checked, the value was indeed refreshed in xenstore, but not in XAPI. So there's 2 problems:

Tests :test_tube:

I did tests on 8.1 and 8.2, same outcome. I have the feeling it's affecting master too.

robhoes commented 4 years ago

Looking at https://github.com/xapi-project/xen-api/commit/4d1b51c23a06005f1b050046db7f94da86fed85e, it appears that the VM_guest_metrics.live field has effectively become obsolete (always true). We should update the API docs to reflect that.

I would expect last_updated to be set to the time when at least one of the VM_guest_metrics fields has been updated (e.g. PV_drivers_version, os_version, networks, ...). So that depends on the fields that xapi reflects. It is not the same as updated in xenstore. Try writing to one of the other keys.

So there isn't really a "heartbeat" that you can get through the guest metrics. It's not really a good mechanism for that sort of thing; due to the overhead, it doesn't scale well. RRDs may be more useful for what you are trying to do.

olivierlambert commented 4 years ago

@robhoes thanks for the answer. However:

Also, having a freshness from the guest would allow to do application monitoring (your app is writing into the xenstore every minute, both with the guest agent). This way, you have a complete control on what's going on from API point of view (reporting that the app isn't sending heartbeat but the VM is for example, might not trigger the same action than the VM isn't sending anything).

That's why having live data from the xenstore by using XAPI is interesting for "higher level" applications.

robhoes commented 4 years ago

RRD data sources can come from xenstore. This is for example how the memory RRDs work. So if a guest agent is writing heart beats to xenstore, then an RRD could be created to watch this. Doing this through the xapi database does not scale, and this is why RRDs were invented in the first place.

olivierlambert commented 4 years ago

That would be a fair solution indeed :+1: Do you have any doc/tutorial on how to build such a thing?

Speaking of RRDs, it would be the occasion to not use XML but something easier to parse instead (in terms of CPU cost). But it's another topic :smile: