vatesfr / xen-orchestra

The global orchestration solution to manage and backup XCP-ng and XenServer.
https://xen-orchestra.com
Other
767 stars 262 forks source link

XenOrchestra becomes unresponsive after a few days. #2820

Closed chrispetsos closed 5 years ago

chrispetsos commented 6 years ago

Context

Expected behavior

Server keeps responding after several days

Current behavior

After a few (2-3) days of uptime, server becomes unresponsive. I cannot open it in browser, can only see the little loading circle, but most probably this is cached. We initially had 2GB RAM for the server and it got full after a couple of days. Thinking that it could be caused by this, increased RAM to 4GB but still the same issue. Note that memory usage is steadily increasing, however the server becomes unresponsive before it gets full.

Any advice?

olivierlambert commented 6 years ago

We need more context. How big is your infrastructure, what feature do you use (backup? ACLs?) etc.

chrispetsos commented 6 years ago

We have 22 XenServer hosts, hosting 244 VMs. We don't use any special features currently. Only evaluating as a replacement for XenCenter. The strange thing is that even after the memory increase to 4GB, memory utilizations stops at 2GBs.

olivierlambert commented 6 years ago

Hard to find the culprit on something installed from the sources (could be the OS, the storage underneath etc.). Do you experience the same issue with XOA in trial?

Also, check the xo-server output: you should have lines with "xo:perfs blocked for XXms". What's the average value for those ms?

chrispetsos commented 6 years ago

We haven't tried the XOA yet and don't believe we will in the near future... As for the xo:perfs values, the average is 838.7742. Can you tell something out of this? I've also read in other issues that it could be the memory allocated to NodeJS that might be making it blow and which is by default 2GB max I think. Do you believe it would help if I increased that to 4GB? Of course this doesn't solve the issue of memory allocation steadily and gradually increasing on an otherwise stale server.

olivierlambert commented 6 years ago

Sounds really high. Try to raise the RAM node value to 4GiB.

etlweather commented 6 years ago

Don't know if that can help, but I have had similar issues where sometimes it just hangs. Just happened now and here is some data I collected. It may or may not be the same problem @chrispetsos is experiencing, but I think putting this data here might help both cases (instead of creating a new issue).

htop output

htop

syslog

syslog

Unfortunately, syslog does not tell us what box causes xo:perf to lag so much... I had suspicions as my monitoring system reported high-cpu usage on dom0 of two Xen Server hosts. I restarted the toolstack on those two hosts (they are running old 6.0.2) and voila, XOA became responsive and CPU usage dropped to nothing on XOA VM and syslog is now empty.

etlweather commented 6 years ago

Humf, just checked journalctl -u xo-server.service and a few seconds/minutes after I observed the lag and started troubleshooting, NodeJS OOMed and restarted - so that may be more where the "resolution" came from.

chrispetsos commented 6 years ago

For anyone interested on how to increase the RAM available to NodeJS and have used the installation from sources and service setup with forever, the change needs to be made at the init.d script, in my case that was /etc/init.d/xen-orchestra. The change is highlighted in the following excerpt from that file,

export FOREVER_ROOT=/root/.forever;\
  \
  cd /opt/xen-orchestra/packages/xo-server/bin;\
    /usr/local/bin/forever \
      -a \
      -l $LOGFILE \
      --minUptime $MIN_UPTIME \
      --spinSleepTime $SPIN_SLEEP_TIME \
      --killSignal $KILL_SIGNAL \
      -c 'node --max_old_space_size=4096' \        # this line needs to be added
       \
      --uid xen-orchestra \
      start xo-server " 2>&1 >/dev/null

I'll let it run with this change and report back on how it goes...

chrispetsos commented 6 years ago

After the increase of memory to Node to 4GB and more than a week of running, it looks like the xo-server process has eaten all memory and the machine has started swapping now. I expect it to blow up within the day...

image

image

image

julien-f commented 6 years ago

@chrispetsos Have you identified any relations to a specific feature?

chrispetsos commented 6 years ago

Nope, we haven't used it at all during the period I am mentioning. No special features, no VM console from the web UI. We basically let it there running and the memory increases steadily...

chrispetsos commented 6 years ago

As I suspected... the server is unresponsive now...

julien-f commented 6 years ago

Unfortunately, until we have identified the root of the issue, you should restart xo-server regularly.

Any help on this subject is welcome :smiley:

chrispetsos commented 6 years ago

OK, cool. So a nightly cron job is restarting our xo-server until the root of this problem is resolved. Thanks!

xiscoj commented 6 years ago

hi, any news?? I noticed the same behavior long time ago, solved with a reboot of the machine every night.

olivierlambert commented 6 years ago

Please follow this: https://github.com/vatesfr/xen-orchestra/issues/2948