ostreedev / ostree

Operating system and container binary deployment and upgrades
https://ostreedev.github.io/ostree/
Other
1.26k stars 291 forks source link

ostree causing issues with multiple KDE users #1913

Closed tlrdrb closed 5 years ago

tlrdrb commented 5 years ago

https://bugs.kde.org/show_bug.cgi?id=411421 my issue https://bugs.kde.org/show_bug.cgi?id=411415 another user https://bugs.archlinux.org/task/63614 arch linux bug report

all three examples show strange behavior when using latest ostree version. downgrading to 2019.2 solves each of our issues.

what I just realized is that as soon as latte-dock is started and is run in debug mode, a message about gpg agent is displayed, and after that, the system starts displaying this behavior. Coincidentally, there was a commit 20 days ago about gpg agent, so it may be related.

tlrdrb commented 5 years ago

steps to reproduce:

found more mentions of this:

https://forum.manjaro.org/t/testing-update-2019-08-28-kernels-nvidia-kde-dev-mesa-python-haskell/100660/25

cgwalters commented 5 years ago

Um, wow. It sounds somehow this is a duplicate of https://github.com/ostreedev/ostree/issues/1907 but I don't have any idea offhand how...

Oh, hmm, I guess if latte-dock is querying libflatpak it might be spamming stderr?

tlrdrb commented 5 years ago

No idea about that, but these messages are displayed:

gpg-connect-agent: no running gpg-agent - starting '/usr/bin/gpg-agent' gpg-connect-agent: waiting for the agent to come up ... (5s) gpg-connect-agent: connection to agent established [debug 15:47:20.553553] - -------- ||| Here we switch virtual desktop [debug 15:47:20.553553] - 0 . "DP-2" - Plasma::Types::TopEdge [debug 15:47:20.553553] - 1 . "DP-2" - Plasma::Types::BottomEdge [debug 15:47:20.553553] - -------- sorted ----- [debug 15:47:20.553553] - 0 . false - "DP-2" - Plasma::Types::BottomEdge [debug 15:47:20.553553] - 1 . false - "DP-2" - Plasma::Types::TopEdge [warning 15:47:20.553553] - Expected JSON property "X-Plasma-Provides" to be a single string. but it is a stringlist ||| IT FREEZES HERE [warning 15:47:50.594594] - QProcess: Destroyed while process ("kreadconfig5") is still running. ||| Every few minutes [warning 15:48:50.728728] - QProcess: Destroyed while process ("kreadconfig5") is still running. ||| Every few minutes [warning 15:49:50.938938] - QProcess: Destroyed while process ("kreadconfig5") is still running. ||| Every few minutes [warning 15:50:20.996996] - QXcbClipboard::setMimeData: Cannot set X11 selection owner ||| Included this to show recovery time. This is not related and can be any other message.

cgwalters commented 5 years ago

Ah...

gpg-connect-agent: waiting for the agent to come up ... (5s)

is probably an important aspect here. I think we probably need to do something like only try to kill the agent if we detect one has started somehow? Would it work to just check if the directory exists and is non-empty maybe?

/cc @dbnicholson

tlrdrb commented 5 years ago

I just did a latte-dock -d debug run. It seems with downgraded ostree, gpg messages never appear at all.

afettouhi commented 5 years ago

I have the same issue here on Arch Linux with KDE Plasma. For me this issue is triggered by the Arch updater widget I have installed plus thew standard widget that shows my home folder and desktop folder. If I interact with these I get a CPU 100 % spike. Downgrading ostree is the only viable solution so far.

dbnicholson commented 5 years ago

Ah...

gpg-connect-agent: waiting for the agent to come up ... (5s)

is probably an important aspect here. I think we probably need to do something like only try to kill the agent if we detect one has started somehow? Would it work to just check if the directory exists and is non-empty maybe?

/cc @dbnicholson

Wow, that's weird. Well, if my previous fix is applied then stderr won't be spammed unless there was actually an error running gpg-connect-agent. The directory won't be empty because we write pubring.gpg there. We could leak some gnupg internal details and check if S.gpg-agent exists. If the gpgme version is bumped to to 1.5.0, then we can maybe we could use gpgme_get_dirinfo to get the actual socket path in agent-socket and test for its existence.

Oh, I just noticed gpg-connect-agent has a --no-autostart option. I think that would be useful regardless since starting and then immediately killing gpg-agent is stupid. But it still prints a message on stderr, and I'd need to check if that option has been available for a while.

Alternatively, we could just rip that whole thing out. gpg-agent will kill itself when the entire directory is removed starting in gnupg-2.1.17. The reason I originally wrote this code was because we (Endless) were on an older version of gnupg. Now that we're on something more current, this isn't actually an issue anymore. It feels a bit wrong to drop this since we don't enforce the version of gnupg.

It might be possible to check the version of gnupg at runtime via gpgme and only run this code on older gnupg.

cgwalters commented 5 years ago

OK. From the Fedora-derivatives perspective, RHEL8 has 2.2.9, so removing it is probably OK. Alternatively, we could leave it in as #if 0 for people who need it to patch back in easily.

dbnicholson commented 5 years ago

Unfortunately, the older stuff is farther behind:

Ubuntu trusty - 2.0.22 Ubuntu xenial - 2.1.11 Debian jessie - 2.0.26 CentOS 7 - 2.0.22

That said, if you want to keep piling on this thing, #1915 skips on newer gnupg. It did the right thing on my local system.

dbnicholson commented 5 years ago

For the record, the gpg-connect-agent --no-autostart option is available starting in 2.1.1. So, not old enough to use reliably.

dbnicholson commented 5 years ago

@tlrdrb Are you using ostree 2019.3-2 on arch? I feel like that should fix the issue you're seeing since it will stop the stderr spam. However, if you are still seeing an issue then maybe there's some other problem with spawning programs from libostree in that context. In that case, more debugging would be needed.

afettouhi commented 5 years ago

No the issue is still present in ostree 2019.3-2 on arch. There are some debug outputs in the bug report on the arch site.

dbnicholson commented 5 years ago

@tlrdrb what is the message that you see about gpg-agent when latte-dock is run in debug mode?

tlrdrb commented 5 years ago

gpg-connect-agent: no running gpg-agent - starting '/usr/bin/gpg-agent' gpg-connect-agent: waiting for the agent to come up ... (5s) gpg-connect-agent: connection to agent established

dbnicholson commented 5 years ago

That's with 2019.3-2? That has the patch that should silence that.

afettouhi commented 5 years ago

@dbnicholson yes, that is with ostree 2019.3-2.

dbnicholson commented 5 years ago

I installed arch in a VM with latte-dock and don't see the issue. Either with ostree-2019.3-1 or ostree-2019.3-2. I also (as expected), do not see the gpg-connect-agent output with the latest ostree.

[dan@arch ~]$ pacman -Q ostree
ostree 2019.3-1
[dan@arch ~]$ flatpak remote-ls flathub >/dev/null
gpg-connect-agent: no running gpg-agent - starting '/usr/bin/gpg-agent'
gpg-connect-agent: waiting for the agent to come up ... (5s)
gpg-connect-agent: connection to agent established
[dan@arch ~]$ sudo pacman -S ostree
...
[dan@arch ~]$ pacman -Q ostree
ostree 2019.3-2
[dan@arch ~]$ flatpak remote-ls flathub >/dev/null
[dan@arch ~]$ 

Where I would look is in plasma discover, since that's the software center that's going to be checking for flatpaks and therefore using ostree.

afettouhi commented 5 years ago

I don't use latte-dock at all. My issues are triggered when I use the plasma5-applets-kde-arch-update-notifier-git from AUR and when I interact with my two folder widgets showing my home folder and desktop folder.

dbnicholson commented 5 years ago

I don't use latte-dock at all. My issues are triggered when I use the plasma5-applets-kde-arch-update-notifier-git from AUR and when I interact with my two folder widgets showing my home folder and desktop folder.

That's bizarre. Have you restarted since you upgraded ostree? There should be no way you'd see those messages with 2019.3-2. Anyways, I also tried that plasma widget yesterday and didn't have any issues with it.

Neither latte-dock nor the arch update notifier widget have any interactions with ostree. What I suspect is happening is that the app center (which calls into flatpak, which calls into ostree) is draining resources and you end up observing that the thing you want to interact with does not have any resources available.

Here's an interesting thing you could try.

sudo mv /usr/lib/qt/plugins/discover/flatpak-backend.so{,.off}
sudo reboot

At that point, the KDE app center shouldn't be able to use flatpak. If you no longer observe the issue, then we can be sure that the issue is in plasma discover and go from there.

afettouhi commented 5 years ago

Yes, i have restarted my machine several times since I updated to ostree 2019.3-2. I will try your suggestion though and report back.

afettouhi commented 5 years ago

well, I just tried your suggestion. I updated ostree to 2019.3-2 and moved flatpak file you suggested and restarted. When I log in I try to interact with a folder in my home folder widget on my desktop. I right click on the folder and my desktop immediately locks up. I switch to a different tty and start top and can see that kwin_x11 using 100 % of my CPU.

dbnicholson commented 5 years ago

OK, it took me a bit to find the real connection. But first I did sudo lsof /usr/lib/libostree-1.so.1. That showed plasmashell had it open. After poking around for a while, I found that /usr/lib/qt/plugins/discover-notifier/FlatpakNotifier.so is the plugin that loads flatpak. Just renaming it with .off didn't work since it seems to load any file found in that directory. Instead:

sudo mv /usr/lib/qt/plugins/discover-notifier/FlatpakNotifier.so /var/tmp

Or you can just remove discover temporarily with pacman -Rdd discover. Log out and then back in again to restart plasmashell. Or press Alt+F2 and run plasmashell --replace. Now see if you can reproduce the behavior. After doing that, sudo lsof /usr/lib/libostree-1.so.1 shows no users for me.

afettouhi commented 5 years ago

I did the sudo mv /usr/lib/qt/plugins/discover-notifier/FlatpakNotifier.so /var/tmp and that works. My desktop doesn't lock up interacting with the earlier widgets, so CPU 100 % spikes of plasmashell or kwin_x11.

dbnicholson commented 5 years ago

Right, so the issue is definitely in the discover notifier locking up plasmashell. I was also able to see plasmashell pegged at 100% CPU after a while but didn't figure out why yet.

Can you restore the original setup and then run G_MESSAGES_DEBUG="OSTree flatpak" plasmashell --replace from a shell? This should show what's going on in flatpak and ostree. When things lock up, paste the last several lines of the output.

afettouhi commented 5 years ago

Well I can restore the plasma shell in part but I can't get to the konsole where I type in G_MESSAGES_DEBUG="OSTree flatpak" plasmashell --replace after the shell returns. It is hidden for some reason I see it in the panel but I can't bring it up.

dbnicholson commented 5 years ago

OK, I was able to reproduce the behavior. With ostree 2019.3, clicking on Check Updates in the arch update widget sends plasmashell to 100% and it never completes. With ostree 2019.2, it completes normally. I bisected and it did indeed point to b6979e7572395f3f99ba328ed9399ed4b862f9a7. Trying to figure out why though...

dbnicholson commented 5 years ago

Arch helpfully has no prebuilt debug packages, so gdb is pretty useless. Off to build glibc and glib with debugging...

afettouhi commented 5 years ago

Yeah, they stopped offering that a long time ago. seems like many distroes are not shipping prebuilt debug packages these days. Making debugging especially for newcomers rather tedious.

dbnicholson commented 5 years ago

1917 fixes it for me. I had to comment out my early return from #1915, but using g_spawn_sync() instead of g_subprocess_new() seems to keep plasmashell from locking up.

afettouhi commented 5 years ago

Good to know that is fixed most likely now. Will this result in a new release or just a patch?