keepassxreboot / keepassxc

KeePassXC is a cross-platform community-driven port of the Windows application “Keepass Password Safe”.
https://keepassxc.org/
Other
20.75k stars 1.44k forks source link

2.7.7 - Database unlock hangs in virtual machine with 1 CPU #10391

Closed pican79 closed 1 month ago

pican79 commented 6 months ago

Overview

On Xubuntu 20.04, I updated from version 2.7.6 to 2.7.7 using the "phoerious PPA". Now, my database doesn't open anymore.

Steps to Reproduce

  1. Launch KeepassXC
  2. Enter Database password
  3. Click Unlock button

Expected Behavior

Database opens and I can use KeepassXC like before.

Actual Behavior

KeepassXC hangs indefinitely on "Unlock Database" screen: endless spinning wheel, password+key file fields greyed out, inactive close & unlock buttons

Context

Tried uninstalling & reinstalling. Can not downgrade to 2.7.6 as it's no longer available in the PPA for focal.

KeePassXC - 2.7.7 Revision: 68e2dd8

Operating System: Linux Desktop Env: XFCE Windowing System: X11

droidmonkey commented 6 months ago

Grab the 2.7.6 appimage from here: https://github.com/keepassxreboot/keepassxc/releases/download/2.7.6/KeePassXC-2.7.6-x86_64.AppImage

Unfortunately, we can't possibly fix your specific problem without more information or a debug run.

pican79 commented 6 months ago

How can I provide you additional info? Does KeepassXC generate logs somewhere? How can I run KeepassXC in debug mode?

droidmonkey commented 6 months ago

Without providing us your database (don't do that) it is nearly impossible to diagnose the issue. Did the 2.7.6 appimage solve your issue? Can you try the 2.7.7 appimage or flatpak?

pican79 commented 6 months ago

Something really weird is going on:

I have to do that last step every time I launch KeepassXC 2.7.7.

I haven't tried the flatpak or appimage yet as I don't like alternative package formats.

droidmonkey commented 6 months ago

Focal is a very old distro. This could very well be a library incompatibility problem, which is why I suggest a packaged deployment instead of ppa or native install.

pican79 commented 6 months ago

I just tried the 2.7.7 AppImage and got the same behavior as with the "2.7.7 ppa".

2.7.6 AppImage worked OK. I saw these messages in the console though:

OpenType support missing for "Saab", script 13 qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 999, resource id: 18967490, major code: 40 (TranslateCoords), minor code: 0

michaelk83 commented 6 months ago

Focal is LTS, still under official support. If some specific libraries are too old in the Focal repositories, the PPA should provide updated versions of those libraries.

droidmonkey commented 6 months ago

I'm just theorizing reasons for the behavior.

michaelk83 commented 6 months ago

How can I provide you additional info?

You can try running a snapshot build with a debugger attached, and post the stack trace that you get when the hang occurs. And just in case, which version of Argon2 and Qt do you have installed?

pican79 commented 6 months ago

Qt version is 5.12.8 (according to apt list --installed | grep libqt) and Argon2 version is 0~20171227-0.2 (according to apt list --installed | grep argon2). Let me know if I should've used other commands to determine those info.

I tried the latest snapstot build from 2024/03/10 with gdb but the log only contained such lines: [New LWP random_number] or [LWP random_number exited]

During my tests, I found a new way to exit the "hang state": minimize the window than "unminimize" it.

droidmonkey commented 6 months ago

This seems to be a user interface refresh issue and not a true hang. I tried to replicate on xfce and couldn't at all.

pican79 commented 6 months ago

In case it's relevant, my Xubuntu system is a Virtualbox (7.0.14) VM running on a Windows 10 host.

droidmonkey commented 6 months ago

So is mine 😅

norbertj123 commented 6 months ago

I encountered the same issue (hangs forever after password entry) in a VM with Fedora 39 Xfce Spin installed. The issue disappeared when I increased the number of CPUs from 1 to 2. That's very strange as I have successfully used VMs with only 1 CPU for years.

pican79 commented 6 months ago

I encountered the same issue (hangs forever after password entry) in a VM with Fedora 39 Xfce Spin installed. The issue disappeared when I increased the number of CPUs from 1 to 2. That's very strange as I have successfully used VMs with only 1 CPU for years.

Thanks for the tip. After increasing the number of CPUs to 2, the "hang issue" no longer occurred. I never had problems before with only 1 CPU either.

When using 1 CPU, I also noticed a "side effect" with 2.7.7. When clicking the close button, it didn't close KeepassXC like before. It simply closed the database. I had to click the close button again to actually close the app.

the-wolfman commented 6 months ago

@norbertj123 Very good catch.

In my setup (issue #10425) I did not have any systems with 1 VCPU, but increasing the number of VCPUs from 2 to 3 or higher solved it for me. I thought at first it could be related to database encryption settings, specifically "threads", but I strongly believe that the issue is happening earlier. Reason is I encounter the issue specifically when securing the database with a yubikey and it is not even getting to the challenge-response.

Further testing shows

My take at the moment: This is a UI related race condition relying on threads to be available depending on features used. In my case, checking the yubikey and displaying the "please touch ribbon" starts a thread, or tries to, which won't come back as expected.

I believe issues #10391 and #10425 are closely related or even duplicates.

pican79 commented 6 months ago

I did a test in a 22.04 Xubuntu VM with only 1 CPU.

With 2.7.7 (ppa or appimage), KeepassXC completely freezes after selecting a database and clicking "I have a key file". I have no choice but to kill the app. It doesn't happen with the 2.7.6 appimage.

With 2 CPUs, 2.7.7 works OK.

droidmonkey commented 6 months ago

Excellent will debug this one next

fugtui commented 6 months ago
* I can open databases, but not one secured with a yubikey running with 2 VCPUs (worked earlier)
* Everything working fine - as far as I tested - running with 3 VCPUs or more

can confirm the behavior for hyper-v with 2 vcpus + yubikey on ubuntu 20.04. with 3 vcpus the challenge-response and everything works as expected. Thanks @norbertj123 for the workaround.

droidmonkey commented 6 months ago

I find that minimizing and restoring while locked up ends up showing the unlocked database. I can get it to lock up without a yubikey so that doesn't seem to be relevant

droidmonkey commented 5 months ago

I cannot figure this one out, its a real head scratcher. The unlock dialog basically gets stuck at building the transformed key (Argon2), it just never finishes. If you minimize the window, the process finishes immediately and the database gets unlocked.

I checked for any obvious processing issues and could find none on Windows using Visual Studio CPU inspector.

I am going to try to dive deeper than I can in gdb (need better skillz) using Qt Creator.

michaelk83 commented 5 months ago

Just a WAG, but this sounds like it might be a scheduler issue? With one VCPU, the Argon thread gets put to sleep to free the VCPU to do other things, like handle Qt events. Once you interact with the UI, Qt releases its own thread (tells it to wait), so the Argon thread can resume. With 2 or more VCPUs, the Argon thread can finish undisturbed. Have you tried adjusting the thread priority etc?

the-wolfman commented 5 months ago

I agree, the yubikey challenge-response is not the reason for the hang but one possible trigger. This may be important. @droidmonkey says it's stuck at building the transformed key. The response from the yubikey should be an integral part of that build process. In my case, the "touch the yubikey" ribbon does not even appear, the challenge is never sent, there is no valid response other than maybe an empty one. Should the Argon build process even run at this point in time? Shouldn't it be waiting for the "second factor" first, whatever that may be, a key file, a security key response, both of them ... and then start building? Does it try to start a "thread per factor" but doesn't have enough resources in form of VCPUs to get them all aligned?

droidmonkey commented 5 months ago

Can you test changing your database encryption to AES KDF (database -> database security -> encryption tab) and see if it still hangs.

the-wolfman commented 5 months ago

Yes, same behavior. I tried with my yubikey and 2 VCPU setup mentioned earlier. I was able to change encryption, store the db with yubikey chall-resp. When trying to open it again, it hangs. As before I can overcome it by clicking on another database tab which will then trigger the blue ribbon.

droidmonkey commented 5 months ago

Ok, well, this is basically a ui problem, but not a ui hang since you can still interact with the app. Everything seems to function properly in the backend. Key transforms happen on encryption settings change and saving.

My only thought left is that something is preventing the event loop from receiving the finished signal and kicking over the ui in some way "releases it". Very strange.

the-wolfman commented 5 months ago

Don`t want to be finnicky, but this may be a hint: If I have only one database showing at the unlock dialog in my setup, the UI hangs. It doesn't do anything with the only exception of minimize/maximize window, which results in redraw issues of all sorts. I can only kill keepassxc in that case. If I have 2 database tabs, e.g. on of them open without yubikey interaction, then I can select another tab and get things going again.

the-wolfman commented 5 months ago

I just came across this comment in #9251 and similar notes on refactoring and GUI/core separation. Sounds to me this issue may influence the refactoring or even be solved by it: " ... need to abstract the actual opening of a database and key material handling away from the open widget itself. This would include the dance with yubikey code since it is async. We should be able to just bypass the entire widget operation if given key material, basically render it disabled while processing. Then if unlock fails just reset the widget." (https://github.com/keepassxreboot/keepassxc/pull/9251#issuecomment-1474848797)

z1atk0 commented 5 months ago

Same problem & symptoms here as the OP. The strange workaround (Settings => Cancel during the unlock hang) also works here as well, but then a "hard hang" occurs on quitting the application with Ctrl+Q. In that state keepassxc then needs to be kill -KILLed from the command line.

I'm on Slackware64-15.0 on an Intel Celeron 743, which only has one core, and no hyperthreading (being a Celeron and all :slightly_smiling_face:). Both AlienBob's SlackPkg and the official AppImage for 2.7.7 show exactly the same behaviour, and both versions/releases of 2.7.6 work just fine.

xianwenchen commented 5 months ago

I have the same problem, symptoms, and workarounds as z1atk0 on a Void Linux i686 system with latest packages from Void Linux official repositories.

If I type password and unlock the database, the UI shows KeePassXC is busy. The CPU usage is almost none. If I then click Settings, the UI was no longer busy. If I then click Cancel, I can use KeePassXC normally.

If I open KeePassXC, do not type anything, and only move the mouse cursor around, the UI becomes busy as well.

If I type password and unlock the database, if I then click anything that is not KeePassXC, KeePassXC freezes. I will have to kill the process.

z1atk0 commented 4 months ago

Just for the record, the newly released 2.7.8 still has the same problem. That's probably to be expected with the milestone of this issue set to 2.8.0, but I thought I'd mention it nevertheless, for good measure. :wink:

droidmonkey commented 4 months ago

We couldn't find a cause which is rather disheartening

ClaraCrazy commented 3 months ago

+1 Also having this issue. My system runs Qubes OS R4.2.1, and this vault VM is on the stock fedora 39 template.

Giving the VM a second vCPU did indeed fix the issue, so thats good, but I'd love to help diagnose this further. I have to admit im short on time lately, but if theres any specific info that might help you guys or if you have patches that need testing (since some seem to be unable to replicate this), please let me know.

droidmonkey commented 3 months ago

I finally fixed my fedora vm so will give this another go in a proper debugger to see what is happening here.

c4rlo commented 2 months ago

FWIW, I wasn't able to reproduce this using

$ taskset --all-tasks --cpu-list 0 keepassxc

on either of my two multi-core machines I just tested this on. The above is meant to pin the process to a single CPU. I've tried pinning it to a few different CPUs; database unlocking always works as normal.

This is with KeePassXC 2.7.8 on Wayland (sway) on Arch Linux.

d-brasher commented 1 month ago

The first commit where I can reproduce the issue seems to be https://github.com/keepassxreboot/keepassxc/commit/f20b53143072a1c778aeb2292f7e2e3793844288

Moreover, compiling without -DWITH_XC_YUBIKEY, the issue does not appear, both in https://github.com/keepassxreboot/keepassxc/commit/f20b53143072a1c778aeb2292f7e2e3793844288 and tagged 2.7.9

droidmonkey commented 1 month ago

OK, so the hangup occurs with the introduction of the concurrent function call that handles libusb_handle_events_completed interaction. The sequence is roughly:

  1. DB is loaded and initiates the hardware key poll which kicks off the loop that handles the above
  2. Unlocking the database initiates another loop to derive the master key
  3. The master key derivation never starts, basically it appears that the app is stuck processing the first loop
  4. When you minimize keepassxc that causes some sort of "flush" to the loops and the key derivation starts and finishes without issue

Basically this happens because the thread handling the usb events never exits/completes until it is forced to when the window is minimized or you switch database tabs. Since there is only 1 processor, Qt only allocates 1 thread to the global pool and the second thread never starts. Bumping up the thread pool to 2 fixes this problem, but we should still investigate why the libusb function never does what we expect. @phoerious