xapi-project / xen-api

The Xapi Project's XenAPI Server
http://xenproject.org/developers/teams/xapi.html
Other
344 stars 283 forks source link

CA-394343: After clock jump the xapi assumed the host is HOST_OFFLINE #5700

Open minglumlu opened 1 month ago

minglumlu commented 1 month ago

Prior to this commit, the xapi on the coordinator host records the 'Unix.gettimeofday' as the timestamps of the heartbeat with other pool supporter hosts. When the system clock is updated with a huge jump forward, the timestamps would be far back into the past. This would cause the xapi assumes that the hosts are offline as long time no heartbeats.

In this commit, the timestamps are changed to get from a monotonic clock. In this way, the system clock changes will not impact the heartbeats' timestamps any more.

Additionally, Host_metrics.last_updated is only set when the object is created. It's useless in check_host_liveness at all.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #5700 +/- ## ======================================== + Coverage 44.8% 45.5% +0.6% ======================================== Files 16 13 -3 Lines 2210 1624 -586 ======================================== - Hits 992 739 -253 + Misses 1218 885 -333 ``` [see 8 files with indirect coverage changes](https://app.codecov.io/gh/xapi-project/xen-api/pull/5700/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=xapi-project) | [Flag](https://app.codecov.io/gh/xapi-project/xen-api/pull/5700/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=xapi-project) | Coverage Δ | | |---|---|---| | [python2.7](https://app.codecov.io/gh/xapi-project/xen-api/pull/5700/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=xapi-project) | `45.5% <ø> (ø)` | | | [python3.11](https://app.codecov.io/gh/xapi-project/xen-api/pull/5700/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=xapi-project) | `?` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=xapi-project#carryforward-flags-in-the-pull-request-comment) to find out more.
lindig commented 1 month ago

Robustness against clock changes is desirable but I question that it can be expected for large clock changes. That's like designing for power outages - it needs to be an explicit goal right from the start.

minglumlu commented 4 weeks ago

Robustness against clock changes is desirable but I question that it can be expected for large clock changes. That's like designing for power outages - it needs to be an explicit goal right from the start.

Yes. The robustness is the major concern of this change. The automation testing relies on the large clock changes heavily, although there may not be common cases in production. But I think improving the robustness in xapi would resolve the potential issues in either case.

coveralls commented 2 weeks ago

Coverage Status

coverage: 44.887%. remained the same when pulling 96035373b3312e40a4251a9e76e1a86d0121574f on minglumlu:private/mingl/CA-394343 into e61e0acc3d04653ce203db316769c66377897869 on xapi-project:master.

minglumlu commented 1 week ago

Rebased with the master branch to resolve the conflict.

gangj commented 5 days ago
Additionally, Host_metrics.last_updated is only set when the object is
created. It's useless in check_host_liveness at all.

I guess it was updated periodically with some time interval in some old version xapi which did not tickle_heartbeat yet, so it was considered as the heartbeat for the new added heartbeat mechanism, add then it was removed in the new heartbeat version. As all hosts will tickle_heartbeat now, it is useless now.