vatesfr / xen-orchestra

The global orchestration solution to manage and backup XCP-ng and XenServer.
https://xen-orchestra.com
Other
785 stars 266 forks source link

Backup job failed "NO_HOSTS_AVAILABLE()" #7387

Closed hanneschu1z closed 3 months ago

hanneschu1z commented 8 months ago

Are you using XOA or XO from the sources?

XO from the sources

Which release channel?

latest

Provide your commit number

eedac

Describe the bug

When I try to run a health check through a Backup Schedule on a remote NAS, the Health Check fails on Startup with "NO_HOSTS_AVAILABLE()". When I run the health check manually on the remote, no error.

Error message

NO_HOSTS_AVAILABLE()

To reproduce

  1. Go to Backup
  2. Edit a Backup Job
  3. Edit Schedules
  4. Check "Health Check"
  5. Select a remote NAS as destination
  6. Run Backup Job 2024-02-16T23_00_00.010Z - backup NG.json Screenshot 2024-02-17 183539

Expected behavior

Backup Job starts up VM and Health check is successfull.

Screenshots

Screenshot 2024-02-17 183723

Node

18.17.1

Hypervisor

XCP-ng 8.2.1

Additional context

No response

olivierlambert commented 8 months ago

Hi,

NO_HOSTS_AVAILABLE: is a XAPI message telling you it couldn't find any host able to boot it, either because you don't have enough memory, or vCPUs available (or other thing, see [1]) when we tried to boot it for healthcheck purpose.

@fbeauchamp we should probably do a VM.assert_can_boot_here on every host which has a SR connected to the healthcheck VM disk after catching a NO_HOSTS_AVAILABLE error, so we can tell exactly the reasons why [1]

[1] List of reasons:

hanneschu1z commented 8 months ago

Hey, thanks a lot for your Response. I wasn't aware that NO_HOSTS_AVAILABLE can mean such things. So I have the cause of my problem: my XCP-ng server is at its memory limit. Is there a way of using Swap for the health checks? I know it's slower but would be an alternative if your not able to Upgrade your RAM.

olivierlambert commented 3 months ago

I'm afraid you will need actual/real RAM to do it. Healthcheck is booting a VM after all, and you need available memory to do it. But that's an interesting thing to think about: could we tell to use less memory for health check? The issue is the impact of reducing the memory to a point the system can't boot and make it think there's a backup/restore problem while it's a memory issue. There's no simple answer.