openbmc / bmcweb

A do everything Redfish, KVM, GUI, and DBus webserver for OpenBMC
Apache License 2.0
148 stars 130 forks source link

Webui pages intermittently not loading, "0x0 Failed to capture connection" logged #273

Closed zevweiss closed 2 months ago

zevweiss commented 3 months ago

Is this the right place to submit this?

Bug Description

It's a bit unpredictable, but I'm seeing sporadic instances of webui-vue page reloads stopping without actually loading anything and leaving me at an empty page. It's usually (I suspect always, but I'm not 100% certain) accompanied by one or more of the following journal messages from bmcweb:

[CRITICAL http_connection.hpp:585] 0x0 Failed to capture connection

The errors seem to crop up more readily if I refresh a page while it's still actively loading (e.g. just hitting ctrl-R repeatedly in a browser without waiting for it to fully load in between), but also happens sometimes when refreshing an idle, fully loaded page.

Version

OpenBMC commit f0053a50e6a423e12b68673c89b53938346a3af6 (bmcweb commit ac25adb8d491342fc5fd4e189c58b79be6f5835a).

Additional Information

I observed the problem on spc621d8hm3 and romed8hm3.

I'm not sure if it's related, but I've also seen some other CRITICAL errors logged occasionally (much less frequently), such as:

[CRITICAL error_messages.cpp:290] Internal Error /usr/src/debug/bmcweb/1.0+git/redfish-core/lib/account_service.hpp(1716:36) `redfish::handleAccountGet(App&, const crow::Request&, const std::shared_ptr<bmcweb::AsyncResp>&, const std::string&)::<lambda(const boost::system::error_code&, const dbus::utility::ManagedObjectType&)>`:

(just ending in a colon like that, looks like there'd be a string with more information but it's empty.)

edtanous commented 3 months ago

Can you provide the contents of the network tab in chrome/firefox when this error happens? It looks very similar to something being worked already, that involves a race condition in webui-vue. https://discord.com/channels/775381525260664832/1219121974173896826

A "fix" is to remove the websocket handler entirely: https://gerrit.openbmc.org/c/openbmc/webui-vue/+/70641

zevweiss commented 3 months ago

Not sure if there's a better form for this than a screenshot, but here's one capture: webui-hang

And yes, after applying 70641 it does seem like the problem goes away -- though FWIW I do have -Drest=enabled in bmcweb.

gtmills commented 2 months ago

-Drest=enabled

https://gerrit.openbmc.org/c/openbmc/webui-vue/+/70641 removed the /subscribe websocket

edtanous commented 2 months ago

@zevweiss can you verify that bmcweb master + webui-vue master solves your issues?

zevweiss commented 2 months ago

Alas no, but for different reasons -- with bmcweb 5ffd11f248f1 and webui-vue 01492c3dcb I can't log in at all. Attempting to do so (entering a valid username & password) just repeatedly reloads the login page.

reqs_000

edtanous commented 2 months ago

I haven't seen that before.... anything unique about your setup?

zevweiss commented 2 months ago

I've got a bbappend with the following contents, but I think that's it:

python() {
    d.setVar("BMCWEB_HTTP_BODY_LIMIT", str((int(d.getVar("FLASH_SIZE")) // 1024) + 2))
}

EXTRA_OEMESON:append = " \
    -Dhttp-body-limit=${BMCWEB_HTTP_BODY_LIMIT} \
    -Drest=enabled \
    "

FWIW, I just tested again with webui-vue 2b33526c41c and still see the same problem.

zevweiss commented 2 months ago

After bisecting the login-failure problem, it seems to have been introduced by bmcweb commit 25b54dba775b31021a3a4677eb79e9771bcb97f7.

zevweiss commented 2 months ago

...and I can reproduce it on current openbmc master (commit c9e483ca4eb67ac212b764f1f7dec8588af72f19) building evb-ast2500 and booting it in qemu:

$ qemu-system-arm -M ast2500-evb \
    -drive file=obmc-phosphor-image-evb-ast2500.static.mtd,format=raw,if=mtd \
    -nographic -serial mon:stdio -nic user,hostfwd=tcp::6443-:443,model=ftgmac100

After rolling bmcweb back to commit aca174983be5a0d2af08044dd93487908ae6cfe5 (the commit before 25b54dba775b31021a3a4677eb79e9771bcb97f7) I can log in to the web UI.

zevweiss commented 2 months ago

Oh, and after seeing it go by in #gh-issues on discord, looks like https://github.com/openbmc/webui-vue/issues/116 is reporting the same problem.

edtanous commented 2 months ago

Please try: https://gerrit.openbmc.org/c/openbmc/bmcweb/+/71309

zevweiss commented 2 months ago

Alright, patch 71309 does appear to resolve the webui login failure problem, thanks -- and with that bmcweb and the latest webui-vue (commit dfba4e542e8167) I haven't been able to reproduce the original problem with pages sometimes not loading. Though I now seem to have neither client-side polling nor events sent from the server notifying the web UI of host power state changes AFAICT (power state shown in the page header gets stale and doesn't update until I do a full page reload). Is that currently expected?

edtanous commented 2 months ago

Is that currently expected?

Yes. The websocket had to be removed because it was crashing (also, because it relies on the deprecated /subscribe dbus API, nobody really wanted to look into why it was crashing). Hopefully someone has the willpower to go implement SSE, or polling, but right now we're just left with the power state not updating live.

zevweiss commented 2 months ago

Ah, alright -- for some reason I had thought that polling had already been implemented in webui-vue, but I guess not yet.