vatesfr / xen-orchestra

The global orchestration solution to manage and backup XCP-ng and XenServer.
https://xen-orchestra.com
Other
780 stars 264 forks source link

XO no longer reconnects automatically to XCP server after a failed connection (timeout) #6265

Open bogdantomasciuc opened 2 years ago

bogdantomasciuc commented 2 years ago

XOA or XO from the sources? XO commit 379e4

If XO from the sources:

Describe the bug XO no longer tries to reconnect to XCP server after a failed connection (timeout)

To Reproduce Steps to reproduce the behavior:

  1. Go to Settings -> Servers
  2. Add a new XCP server
  3. Block access to said server through a firewall rule or unplug cable for more than 5 minutes
  4. See connection is marked with an error icon in Settings -> Servers
  5. Reconnect cable / disable blocking rule
  6. See connection to server still marked as unavailable. I left it like that for hours and it does not revert to available.

Expected behavior It used to be that the connection was retried every 1 minute but now it doesn't seem to work like that any more. I caught this for quite some time - at least 3 months but I thought it was due to my setup. I have reinstalled XO since and it behaves in the same way.

Screenshots Connection marked as failed even though the server is available now:
FailedServerConnection

Error message: FailedServerConnectionErr

Proof server is available: FailedServerConnectionActuallyAvailable

Environment (please provide the following information):

Other information If I click on the "Enabled" button to disable the connection and then click on the button again to enable the connection everything starts working again but it should start working automatically without someone recycling the connection manually. Also restarting XO vm or the related services have the same effect.

If you reached this line: Thank you! :)

julien-f commented 2 years ago

Thank you for your detailed report, we'll investigate :)

bogdantomasciuc commented 2 years ago

Thanx!

julien-f commented 2 years ago

I cannot reproduce on my side.

I've used sudo iptables -I OUTPUT -d <address> -j DROP to block access to my host and XO correctly detects the disconnection and remove the objects from the UI.

Then I removed the rule (sudo iptables -D OUTPUT -d <address> -j DROP), and after a few minutes, the objects reappeared in the UI.

bogdantomasciuc commented 2 years ago

That is curious. I will make more tests and come back with the results.

bogdantomasciuc commented 2 years ago

Ok I managed to replicate it again. Do sudo iptables -I OUTPUT -d <address> -j DROP then go to Settings page and cycle the button Enable/Disable. When you enable it it will try to connect for some time. You will see the spinner animating. Leave it like that for a few minutes until you can see the attention icon below and if you click it you see "connect ETIMEDOUT [...]". After delete the blocking rule and leave it alone. It will not reconnect by itself.

During the night nobody recycles the connection when the link is down but somehow we reach the same result. This is just a way to mimic the problem.

julien-f commented 2 years ago

It's possible that if XO cannot connect when enabling it, it will not keep retrying.

It will only retry if the connection is lost, not when the host is not available initially.

bogdantomasciuc commented 1 year ago

I had this issue happen again. The VPN tunnel went down and even though the tunnel reconnected at some point the connection stayed down over the weekend. We disabled/enabled the connection manually on Monday to reconnect it. XO vm details are: image

olivierlambert commented 1 year ago

We should try to reproduce (spike then).

olivierlambert commented 1 year ago

Lib xenapi rewrite planned to improve that. Work started by @julien-f

julien-f commented 1 year ago

See #6947

bogdantomasciuc commented 1 year ago

Great news! Looking forward to testing! 🥳