Closed rayrayrayraydog closed 1 week ago
Is the sssd binary new? @KyleGospo can add a seuid in the Dockerfile for now as i dont feel like bumping rechunk
Oh we never checked for that directory
Is the sssd binary new? @KyleGospo can add a seuid in the Dockerfile for now as i dont feel like bumping rechunk
Oh we never checked for that directory
It was a non-issue until F41 where the service account sssd was added to run the services as non-root. It's mentioned here: https://docs.fedoraproject.org/en-US/fedora/latest/release-notes/sysadmin/
Thank you for helping support an edge case like this.
Noticed this morning that the SSSD package is installed by default in bazzite and Fedora in general. The file _selinuxchild that has capabilities comes from another package I had layered to get it working with MS AD.
So then, if I understand correctly upgrading from f40 to f41 has some script that changes file ownership permissions somewhere? Or is this part of the new RPM post-install scripts?
This is interesting. If the latter, I would expect everything in /usr to have the proper permissions, but how would this be addressed for /etc files in atomic distros? What does Fedora Silverblue do? Perhaps the problem is present there as well?
We are using a derived image from kinoite that gets rechunked in the end
Onenof the quirks of doing this is that we lose the xattrs of the original image for now. If we didn't rechunk the image we would lose the derived xattrs instead
In any case, what changed in f41 is that sssd is now part of kinoite, which caused it to lose its xattrs. So we had to add them back in
Rechunk 1.0.1 is merged in testing so next bazzite build will have them fixed
I updated Bluefin gts today and can no longer login via sssd:
AutomaticUpdates: stage; rpm-ostreed-automatic.timer: last run 41min ago
Deployments:
ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:gts
Digest: sha256:c7d12e8d5e6bf444ffa3c0c740aaaa5bbc204360d0315b6389d96538fa8f4bb8
Version: 40.20241102.0 (2024-11-03T05:44:06Z)
Diff: 25 upgraded, 5 removed, 6 added
LayeredPackages: touchegg
● ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:gts
Digest: sha256:dfc42faacfffa0f990206db843e9a9cd84f58a99374b7f81123ca4aeaebead4c
Version: 40.20241029.0 (2024-10-29T18:30:00Z)
LayeredPackages: touchegg
Pinned: yes
I have rolled back to the 40.20241029.0 build which still works fine. I am not sure if something has been merged (that should not have been) in 40-based images, or maybe they are also affected and need the same fix?
Furthermore, after facing the same issue on my second system, I instead upgraded to Bluefin latest (41-based) and that seems to work fine:
AutomaticUpdates: stage; rpm-ostreed-automatic.timer: no runs since boot
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx:latest
Digest: sha256:27d65e684ef5e4159e480ec4c531b137579d31707773cba9d13bf75dbbf47495
Version: 41.20241103.0 (2024-11-03T04:44:01Z)
LayeredPackages: pwgen
ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx:gts
Digest: sha256:64fc3f87ad0249f9d6b3284c60168948e0c5a4e54cb54943b91d133abe5dfc5a
Version: 40.20241102.0 (2024-11-03T05:44:36Z)
LayeredPackages: pwgen
ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx:stable
Digest: sha256:75fe45926ca23fe63414bc42b22f40e650decc1eae8d6d4b621e7d8e6b3721d0
Version: 40.20241027.0 (2024-10-27T05:46:52Z)
LayeredPackages: pwgen
Pinned: yes
I can also confirm that ownership has changed:
❯ sudo ls -al /var/lib/sss/
total 0
drwxrwxr-x. 1 sssd sssd 100 May 2 2024 .
drwxr-xr-x. 1 root root 1042 Nov 3 17:20 ..
drwxrwx---. 1 sssd sssd 394 Nov 3 17:20 db
drwxrwx--x. 1 sssd sssd 0 Apr 30 2024 deskprofile
drwxrwx---. 1 sssd sssd 0 Apr 30 2024 gpo_cache
drwx------. 1 root root 0 Apr 30 2024 keytabs
drwxrwxr-x. 1 sssd sssd 48 Nov 3 17:20 mc
drwxrwxr-x. 1 sssd sssd 32 Nov 3 17:20 pipes
drwxrwxr-x. 1 sssd sssd 108 Nov 3 17:25 pubconf
drwxrwx---. 1 sssd sssd 22 Apr 30 2024 secrets
localuser in 🌐 myhost in ~
❯ ps -ef | grep sssd
sssd 1550 1 0 17:20 ? 00:00:00 /usr/sbin/sssd -i --logger=files
sssd 1561 1550 0 17:20 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain ad.home.lan --logger=files
sssd 1563 1550 0 17:20 ? 00:00:00 /usr/libexec/sssd/sssd_nss --logger=files
sssd 1564 1550 0 17:20 ? 00:00:00 /usr/libexec/sssd/sssd_pam --logger=files
sssd 1565 1550 0 17:20 ? 00:00:00 /usr/libexec/sssd/sssd_pac --logger=files
sssd 2966 1 0 17:20 ? 00:00:00 /usr/libexec/sssd/sssd_kcm --logger=files
localus+ 31124 28469 0 17:34 pts/1 00:00:00 grep --color=auto sssd
Not sure if any fix has been merged, but just reporting the data points here in case they are useful.
@castrojo update to rechunk 1.0.1 to fix this
We bumped to 1.0.1 yesterday but that was after the stable builds went out, I'll push out new ones shortly.
OK @karypid - images are updated, let me know!
@castrojo Do you mean a beta image? I'm not seeing anything yet.
So, I think the problem is that the capabilities are only needed for 41-based images where the software runs as user sssd
.
At least this is the only difference I can find between the 3 images below, of which only the pinned works:
❯ rpm-ostree status
State: idle
AutomaticUpdates: stage; rpm-ostreed-automatic.timer: no runs since boot
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:stable-daily
Digest: sha256:e1554f6ad79bd38c739e1175a0a45c2b6bcf5b07b37f671bf0cc345ac65755b0
Version: 40.20241103.0 (2024-11-04T05:46:52Z)
LayeredPackages: touchegg
ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:gts
Digest: sha256:ddcdb165ae06a724ee4093485b40fd4bf548a5cb22b1abf6e65aa7b09ba68a4a
Version: 40.20241102.0 (2024-11-03T17:58:54Z)
LayeredPackages: touchegg
ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:gts
Digest: sha256:dfc42faacfffa0f990206db843e9a9cd84f58a99374b7f81123ca4aeaebead4c
Version: 40.20241029.0 (2024-10-29T18:30:00Z)
LayeredPackages: touchegg
Pinned: yes
In all 3 everything is owned and runs as root so that is common:
❯ ls -al /var/lib/sss/
total 0
drwxr-xr-x. 1 root root 100 Apr 14 2024 .
drwxr-xr-x. 1 root root 1032 May 16 19:20 ..
drwx------. 1 root root 2066 Nov 4 19:20 db
drwxr-x--x. 1 root root 0 Apr 14 2024 deskprofile
drwxr-xr-x. 1 root root 0 Apr 14 2024 gpo_cache
drwx------. 1 root root 0 Apr 14 2024 keytabs
drwxrwxr-x. 1 root root 48 Nov 4 19:20 mc
drwxr-xr-x. 1 root root 32 Nov 4 19:20 pipes
drwxr-xr-x. 1 root root 70 Nov 4 19:26 pubconf
drwx------. 1 root root 22 Apr 14 2024 secrets
~/sssd
❯ ps -ef | grep sssd
root 1853 1 0 19:20 ? 00:00:00 /usr/sbin/sssd -i --logger=files
root 1996 1853 0 19:20 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain ad.home.lan --uid 0 --gid 0 --logger=files
root 2090 1853 0 19:20 ? 00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
root 2091 1853 0 19:20 ? 00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files
root 2092 1853 0 19:20 ? 00:00:00 /usr/libexec/sssd/sssd_pac --uid 0 --gid 0 --logger=files
root 4066 1 0 19:24 ? 00:00:00 /usr/libexec/sssd/sssd_kcm --uid 0 --gid 0 --logger=files
localus+ 5327 5107 0 19:27 pts/1 00:00:00 grep --color=auto sssd
But the 40.20241102.0 (gts) and 40.20241103.0 (stable-daily) have the new capabilities which I think are interfering (these are probably only needed for 41-based images):
❯ sudo getcap /usr/libexec/sssd/*
/usr/libexec/sssd/krb5_child cap_chown,cap_dac_override,cap_setgid,cap_setuid=ep
/usr/libexec/sssd/ldap_child cap_chown,cap_dac_override,cap_setgid,cap_setuid=ep
/usr/libexec/sssd/selinux_child cap_chown,cap_dac_override,cap_setgid,cap_setuid=ep
/usr/libexec/sssd/sssd_pam cap_dac_read_search=p
The older pinned 40.20241029.0 (gts) which is the only one that works, has no capabilities:
~/sssd
❯ sudo getcap /usr/libexec/sssd/*
[sudo] password for localuser:
~/sssd took 3s
As I reported yesterday, latest works fine, and has the capabilities, but is also 41-based and runs/owns the files as sssd....
As I reported yesterday, latest works fine, and has the capabilities, but is also 41-based and runs/owns the files as sssd....
That's odd. I'm on latest 41 but I don't see the capabilities there. Do you mean in Bluefin?
As I reported yesterday, latest works fine, and has the capabilities, but is also 41-based and runs/owns the files as sssd....
That's odd. I'm on latest 41 but I don't see the capabilities there. Do you mean in Bluefin?
Ah yes: as shown in my rpm-ostree status
output, I am testing on Bluefin. Latest Bluefin works fine for me (tested yesterday) and runs everything as sssd (but file ownership is also set to user sssd).
I am not clean on how much bazzite/bluefin share in the underlying infrastructure. Happy to file a separate bug report if this is not a ublue-os core issue.
@rayrayrayraydog Hello from Bluefix-41. Just double-confirmed that latest is working fine:
When running on Bluefin 41 (latest) everything is owned by sssd except keytabs (not sure if that is an issue, but it definitely doesn't prevent me from logging in):
❯ ls -al /var/lib/sss/
total 0
drwxrwxr-x. 1 sssd sssd 100 Apr 14 2024 .
drwxr-xr-x. 1 root root 1042 Nov 4 22:53 ..
drwxrwx---. 1 sssd sssd 2094 Nov 4 22:54 db
drwxrwx--x. 1 sssd sssd 0 Apr 14 2024 deskprofile
drwxrwx---. 1 sssd sssd 0 Apr 14 2024 gpo_cache
drwx------. 1 root root 0 Apr 14 2024 keytabs
drwxrwxr-x. 1 sssd sssd 48 Nov 4 22:53 mc
drwxrwxr-x. 1 sssd sssd 32 Nov 4 22:53 pipes
drwxrwxr-x. 1 sssd sssd 108 Nov 4 22:54 pubconf
drwxrwx---. 1 sssd sssd 22 Apr 14 2024 secrets
Everything runs as sssd:
~
❯ ps -ef | grep sssd
sssd 2080 1 0 22:53 ? 00:00:00 /usr/sbin/sssd -i --logger=files
sssd 2150 2080 0 22:53 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain ad.home.lan --logger=files
sssd 2152 2080 0 22:53 ? 00:00:00 /usr/libexec/sssd/sssd_nss --logger=files
sssd 2153 2080 0 22:53 ? 00:00:00 /usr/libexec/sssd/sssd_pam --logger=files
sssd 2154 2080 0 22:53 ? 00:00:00 /usr/libexec/sssd/sssd_pac --logger=files
sssd 3574 1 0 22:54 ? 00:00:00 /usr/libexec/sssd/sssd_kcm --logger=files
localus+ 21424 20925 0 22:55 pts/1 00:00:00 grep --color=auto sssd
And the capabilities are present:
~ took 3s
❯ sudo getcap /usr/libexec/sssd/*
/usr/libexec/sssd/krb5_child cap_chown,cap_dac_override,cap_setgid,cap_setuid=ep
/usr/libexec/sssd/ldap_child cap_chown,cap_dac_override,cap_setgid,cap_setuid=ep
/usr/libexec/sssd/selinux_child cap_chown,cap_dac_override,cap_setgid,cap_setuid=ep
/usr/libexec/sssd/sssd_pam cap_dac_read_search=p
Here is rpm-ostree status with 41.20241104.0 (latest) on bluefin-dx-nvidia channel:
~
❯ rpm-ostree status
State: idle
AutomaticUpdates: stage; rpm-ostreed-automatic.timer: no runs since boot
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:latest
Digest: sha256:a11ce0ebeadd60956be56cff181d656b6f43edb15fb98bc54b709c85ff332a71
Version: 41.20241104.0 (2024-11-04T21:02:12Z)
LayeredPackages: touchegg
ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:gts
Digest: sha256:dfc42faacfffa0f990206db843e9a9cd84f58a99374b7f81123ca4aeaebead4c
Version: 40.20241029.0 (2024-10-29T18:30:00Z)
LayeredPackages: touchegg
Pinned: yes
So it seems to me that Bluefin has the opposite problem:
I'm also experiencing the issue where the sssd service fails to run on Bazzite 41 (bazzite-nvidia:stable 41.20241104), but worked fine on Bazzite 40. The error I was seeing was that it was unable to read the keytab (/etc/krb5.keytab
), which does exist and has valid data (according to sudo klist
). I did try unenrolling and then reenrolling the host (using the ipa-client-install
from the freeipa-client package) but no luck there.
I've rolled back to 40.20241020 which is working fine after reowning the /var/lib/sss/
dir back to root:root and removing the cache databases.
I haven't checked in a few days, but I was also experiencing this issue on 41. Same thing, unable to read the keytab. I set my image to :40 to lock me there for the time being.
I haven't checked in a few days, but I was also experiencing this issue on 41. Same thing, unable to read the keytab. I set my image to :40 to lock me there for the time being.
Can you please provide "rpm-ostree status" output? What works varies based on the image (Bluefin/Aurora/Bazzite) and channel (latest/stable/etc) you are on.
Here's mine. Capabilities are still missing from the SSSD files in /usr/libexec/sssd. I had also changed permissions on /etc/krb5.keytab to try and get the services working.
root@bazzite:~# rpm-ostree status
State: idle
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/bazzite:stable
Digest: sha256:10ebb6e959a7574a56cd90451652eb3cc2ce6e406465661092875ea91d6318f3
Version: 41.20241104 (2024-11-04T05:00:08Z)
LayeredPackages: adcli htop krb5-workstation libguestfs-tools libvirt libvirt-daemon-config-network libvirt-daemon-kvm oddjob oddjob-mkhomedir
plasma-workspace-x11 pugixml qemu-kvm sssd terminator virt-install virt-manager virt-top virt-viewer
ostree-image-signed:docker://ghcr.io/ublue-os/bazzite:stable
Digest: sha256:10ebb6e959a7574a56cd90451652eb3cc2ce6e406465661092875ea91d6318f3
Version: 41.20241104 (2024-11-04T05:00:08Z)
LayeredPackages: adcli htop krb5-workstation libguestfs-tools libvirt libvirt-daemon-config-network libvirt-daemon-kvm oddjob oddjob-mkhomedir
plasma-workspace-x11 pugixml qemu-kvm terminator virt-install virt-manager virt-top virt-viewer
@KyleGospo snoozed on the builds so the fix is not in yet
Switch to testing for a bit if you need the functionality. Current testing build is stable.
Thanks, the bug does indeed seem to be fixed in testing-41.20241106
I can also confirm that testing-41.20241106
seems to be a working image.
The only thing I am not clear on is what will happen with GTS versions: are those going to use the latest sssd
packages/layout that run as user/group sssd/sssd
and have associated capabilities / permissions, or will they stay as-is?
The reason I ask is because my understanding is that GTS will remain based on Fedora 40, so I guess this change should not trickle down to that?
Same rechunk is used across all image variants, for bluefin too. It is a packaging tool external to the image
However F40 did not seem to have this issue anyway.
Same rechunk is used across all image variants, for bluefin too. It is a packaging tool external to the image
However F40 did not seem to have this issue anyway.
Unfortunately my concern is the fix for F41 unfortunately breaks F40!
See my comment above here: https://github.com/ublue-os/bazzite/issues/1818#issuecomment-2453506750
My suspicion is that the fix for F41 unfortunately breaks F40 due to the "common rechunk" you mentioned...
I am now forced to use the latest F41 version of Bluefin...
One other thing to report is that I tried switching back to GTS to see if I can fix F40. I found that:
/var/lib/sss
and /var/log/sssd
must be set back to root:root instead of sssd:sssddb
package was update so /var/lib/sss/db
cannot be read if you roll back after trying latestAt this point I was able to start sssd in F40/bluefin/gts, but I get an error that matches this paywalled report so I gave back and went back to latest. The good thing is that bluefin/latest simply works...
I am not sure how an atomic distro can go about addressing this. Even Silverblue should become "rollback impossible" after you install 41 (db upgraded, permissions change, capabilities are marked on binaries). Once that is done there seems to be no way to go back. This has made me lose confidence in the whole "atomic distro" approach. Looks like there is still a case of BTRFS snapshots and things like snapper... They might have protected against this.
Either way, my concern here is that I would expect GTS to have "kept working the old way". No need for the extra sssd
user/group, everything should still runs and is owned by root, and the capabilities on the binaries should not be needed.
Unfortunately I am afraid that (unless I have done something wrong) sssd
may now unusable in GTS versions...
If I find time, I will spin up a VM and try to join a domain with a vanilla fresh GTS version, and report back.
Rolling back a major fedora version is expected to cause minor issues that require manual intervention
We did our best with bazzite this cycle to make sure that it is possible. And it is possible.
SSSD was outside the things we check. And the fact that it broke seems to be an upstream issue about the changing of permissions.
So there is no issue here. You are expected to update once, have the permissions change, and then not switch back.
As I understand it both f40 and f41 work fine on their own and the issue is the rollback, which is now documented here too
Rolling back a major fedora version is expected to cause minor issues that require manual intervention
@antheas I am going to try and explain this once more, as it seems I am not getting my message through: the problem I am currently still reporting has nothing to do with rolling back. In fact it has nothing to do with upgrading. Here is the sequence of events:
Today I downloaded fresh ISOs of Fedora Silverblue 40 and Bluefin-dx-gts and made fresh installations in two VMs, then tested:
I made some posts to explain that:
The fix for sssd in latest/stable breaks GTS. People using GTS can no longer use sssd.
Everyone here is reporting that the fix works after updating to latest.
I am reporting that yes, I was forced to update to latest and the reason is that the fix breaks GTS.
Everything else is just discussion on my actions afterward. I am trying to collect information on why this no longer works on GTS. I have understood that there is some common rechunking process across images and I suspect that this is what affects GTS, but I have yet to find out why. When I rolled back I checked permissions, capabilities, db versions and fixed all of those (reporting findings) but was still unable to get to the bottom of why sssd no longer works on GTS after this fix.
I hope this clarifies what I'm trying to report. I suppose at some point I will flip to stable which is a good enough balance of stability and a working sssd, but I was hoping to stay on GTS for at least 4-6 months before I try stable.
Still not seeing the fixes in stable 41.20241112.1. Is that expected?
I tried to bisect the issue on 40-based images. In a fresh VM I was going over the "join domain" process and the breakage occurs between:
@antheas can you confirm that the rechunk process was updated between these two builds for the purpose of fixing the 41-based image?
The process I use to test:
rpm-ostree rebase ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx:40-yyyymmdd
mkdir -p /var/lib/sss/pubconf/krb5.include.d
realm join my.home.domain
getent passwd myuser@my.home.domain
(should resolve)realm permit -g 'domain user@my.home.domain'
myuser@my.home.domain
from gdm (should be allowed)Hi, just a reminder I am using bazzite only.
@karypid I say that because I think this bug has turned into a much wider discussion about sssd in UBlue in general. Is this the right place for that?
I am just a user of bluefin so not aware of the build process. I am not clear on why Bazzite has not picked up the rechunk fix. My understanding based on antheas' post above was that the fix was common:
Same rechunk is used across all image variants, for bluefin too. It is a packaging tool external to the image
Perhaps this packaging tool is common, but needs to be updated in each variant separately?
I had a look at changes in Bluefin between the 2 versions above where gts broke and found this: https://github.com/ublue-os/bluefin/pull/1859/files
Since this is in ublue-os/bluefin I suppose it needs to be replicated in ublue-os/bazzite, and then a new image must be released.
I wish I knew more about the toolchain so that I may contribute, but right now I'm only starting to get familiar with these tools... I guess everyone here is a volunteer and we just need to wait until someone has time to merge this for bazzite.
To be fair, not many people run AD at home so this is a rather the niche feature...
This is fixed as of bazzite:stable-41.20241118!!! I am going to mark this closed since the issue as described initially in bazzite is resolved. @karypid you may want to raise a new issue with bluefin with your concern about SSSD there.
Describe the bug
Upon upgrading Bazzite to F41, I found that my layered install of SSSD was broken. I also found that the same SSSD setup would work when added to a new install of Kinoite 41.
In a test VM of Bazzite from the 41 ISO, I was able to recreate this issue. I think I've narrowed it down to a few binaries used by SSSD not having capabilities assigned as they do with the same package on Kinoite.
bazzite system:
kinoite system:
What did you expect to happen?
I would expect the binaries _krb5child, _ldapchild, and _sssdpam under /usr/libexec/sssd to have the proper capabilities so that SSSD services can function. I cannot directly modify them as they are under /usr. This was not an issue in bazzite on F39, wherein SSSD ran under the root account.
I realize that the bazzite team doesn't touch SSSD and doesn't ship it, but given that it works out of the box on Kinoite with the same version of the package I can't rule out something in bazzite's setup.
Output of
rpm-ostree status
root@bazzite:/usr/libexec/sssd# rpm-ostree status State: idle Deployments: ● ostree-image-signed:docker://ghcr.io/ublue-os/bazzite:stable Digest: sha256:dd21242732e272339c1e5d9cee6441f95e819223e12d53cf1430a3db517d6bd6 Version: 41.20241029.1 (2024-10-29T15:39:27Z) LayeredPackages: adcli htop krb5-workstation libguestfs-tools libvirt libvirt-daemon-config-network libvirt-daemon-kvm oddjob oddjob-mkhomedir plasma-workspace-x11 pugixml qemu-kvm sssd terminator virt-install virt-manager virt-top virt-viewer
ostree-image-signed:docker://ghcr.io/ublue-os/bazzite:stable Digest: sha256:dd21242732e272339c1e5d9cee6441f95e819223e12d53cf1430a3db517d6bd6 Version: 41.20241029.1 (2024-10-29T15:39:27Z) LayeredPackages: adcli htop krb5-workstation libguestfs-tools libvirt libvirt-daemon-config-network libvirt-daemon-kvm oddjob
Hardware
This is a KVM virtual machine built with the latest bazzite-stable.iso for F41.
root@bazzite:/usr/libexec/sssd# cat /sys/devices/virtual/dmi/id/product_name Standard PC (Q35 + ICH9, 2009)
Extra information or context
No response