kimchi-project / kimchi

An HTML5 management interface for KVM guests
https://github.com/kimchi-project/kimchi/releases/latest
Other
3.12k stars 364 forks source link

New VMs cause exponential slowness /plugins/kimchi/vms AJAX request #1130

Closed ss23 closed 5 years ago

ss23 commented 7 years ago

As we have added more VMs (currently at 13 VMs) the AJAX call to /plugins/kimchi/vms has slowed down exponentially. It currently takes approximately 30 seconds to complete. The server has 10 cores, a load of <3, four SSDs in RAID10, and a dual gigabit ethernet connection, so I wouldn't expect it to be something to do with the server being overloaded.

There are some other unsually slow requests: /plugins/kimchi/networks takes approximately 6 seconds when clicking "edit" on a VM, as does /plugins/kimchi/host/devices?_passthrough=true&_cap=pci&_available_only=true which takes approximately the same amount of time. All other requests take sub 50ms.

I'm unsure if this is a bug, but is anyone else experiencing this, or know how to debug/fix?

alinefm commented 7 years ago

Hi @ss23

The problem related to /plugins/kimchi/host/devices?_passthrough=true&_cap=pci&_available_only=true is a known issue and it is being tracked on #993

About the delay on /plugins/kimchi/networks, I am not sure what it is happening. How many virtual networks you set on your server? Any considerable amount that can explain that delay?

ss23 commented 7 years ago

@alinefm Nothing I know of to explain it. It's a somewhat unsual situation perhaps, there 6 networks, using openvswitch.

ss23 commented 7 years ago

Not sure how much more useful this information is, but:

When I first start wokd, the relevant python processes are doing nothing. That is, they are using no measurable amount of CPU. An strace at this point shows a somewhat expected situation, select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) repeated forever. However, after loading the Kimchi interface, even after I close the browser window, the python process will now use 100% of the CPU forever. The strace output now looks something like:

select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
futex(0x14b4580, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x14b4580, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x14b4580, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff) = 0
futex(0x14b4580, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff) = 0
futex(0x14b4580, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x14b4580, FUTEX_WAKE_PRIVATE, 1) = 1
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)

Unfortunately I'm lacking the context to understand what's going on here.

In any case, using 100% of a core as soon as someone loads any page in Kimchi is not optimal.

ss23 commented 7 years ago

The above issue is fixed with the fix of kimchi-project/wok#217, which seems to have measurably sped up the networks and devices calls.

The remaining very slow call is to /plugins/kimchi/vms, taking approximately 20 seconds on average.

overcookedTOFU commented 7 years ago

@alinefm Is there any testing I can do for you so that you can close this issue? This may also have resolved https://github.com/kimchi-project/kimchi/issues/993.

alinefm commented 7 years ago

@overcookedTOFU no! Thanks for updating it. I am closing this issue by now.

ss23 commented 7 years ago

@overcookedTOFU @alinefm I don't think this issue is resolved?

I've installed the latest version of wok/kimchi and if anything it seems to be slightly worse - currently at 10 seconds to load /plugins/kimchi/vms with no load.

Can the issue be reopened, or is it tracked somewhere else?

alinefm commented 7 years ago

@ss23 Sure! I will reopen it and investigate better later.