ublue-os / bluefin

The next generation Linux workstation, designed for reliability, performance, and sustainability.
https://projectbluefin.io
Apache License 2.0
1.06k stars 146 forks source link

Failure to boot on latest bluefin-dx-nvidia image #1371

Closed piperswe closed 3 months ago

piperswe commented 3 months ago

Describe the bug

Upon rebasing from ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:gts@sha256:1cc9d1e1df47145c58601fd004f80aa6dd8342666dfc9949bc9631dea45d531c to ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:latest@sha256:ad7d306b70df3fbf7a772efb34858aa445e71366b76205259276a04325b07eda and rebooting, I am stuck on a gray screen with my cursor frozen near the corner of the screen.

What did you expect to happen?

I should have booted into a functioning Bluefin Fedora 40 install.

Output of rpm-ostree status

~ 
❯ rpm-ostree status -v
State: idle
AutomaticUpdates: stage; rpm-ostreed-automatic.timer: no runs since boot
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:gts (index: 0)
                   Digest: sha256:1cc9d1e1df47145c58601fd004f80aa6dd8342666dfc9949bc9631dea45d531c
                  Version: 39.20240606.0 (2024-06-06T16:54:30Z)
               BaseCommit: 2be86a9b225ff5ad4ef21e6921d4d0ddb1774998c9bd6cd7a5f5ce6e59908b30
                   Commit: ac05b93335eb9f648da933edb1eb1545c9d33448b11ec3d3c19f924cd41f1ff6
                           ├─ 1password (2024-05-21T13:11:59Z)
                           ├─ copr:copr.fedorainfracloud.org:g3tchoo:prismlauncher (2024-05-08T12:24:33Z)
                           ├─ copr:copr.fedorainfracloud.org:hikariknight:looking-glass-kvmfr (2024-04-10T22:38:22Z)
                           ├─ copr:copr.fedorainfracloud.org:phracek:PyCharm (2024-03-18T11:54:48Z)
                           ├─ copr:copr.fedorainfracloud.org:uriesk:minecraft-wayland-glfw-git (2024-04-22T04:03:54Z)
                           ├─ fedora (2023-11-01T00:12:39Z)
                           ├─ google-chrome (2024-06-05T19:42:28Z)
                           ├─ rpmfusion-nonfree-nvidia-driver (2024-05-24T08:09:32Z)
                           ├─ rpmfusion-nonfree-steam (2024-04-20T13:10:44Z)
                           ├─ updates (2024-06-06T10:28:27Z)
                           └─ updates-archive (2024-05-17T02:42:10Z)
                   Staged: no
                StateRoot: default
          LayeredPackages: 1password glfw prismlauncher waydroid

  ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:gts (index: 1)
                   Digest: sha256:1cc9d1e1df47145c58601fd004f80aa6dd8342666dfc9949bc9631dea45d531c
                  Version: 39.20240606.0 (2024-06-06T16:54:30Z)
               BaseCommit: 2be86a9b225ff5ad4ef21e6921d4d0ddb1774998c9bd6cd7a5f5ce6e59908b30
                   Commit: 232d4f07ece56ea4f78e196d4a1cd41d62a867b40548734da41a468663d26c89
                           ├─ 1password (2024-05-21T13:11:59Z)
                           ├─ copr:copr.fedorainfracloud.org:g3tchoo:prismlauncher (2024-05-08T12:24:33Z)
                           ├─ copr:copr.fedorainfracloud.org:hikariknight:looking-glass-kvmfr (2024-04-10T22:38:22Z)
                           ├─ copr:copr.fedorainfracloud.org:phracek:PyCharm (2024-03-18T11:54:48Z)
                           ├─ copr:copr.fedorainfracloud.org:uriesk:minecraft-wayland-glfw-git (2024-04-22T04:03:54Z)
                           ├─ fedora (2023-11-01T00:12:39Z)
                           ├─ google-chrome (2024-06-05T19:42:28Z)
                           ├─ rpmfusion-nonfree-nvidia-driver (2024-05-24T08:09:32Z)
                           ├─ rpmfusion-nonfree-steam (2024-04-20T13:10:44Z)
                           ├─ updates (2024-06-06T10:28:27Z)
                           └─ updates-archive (2024-05-17T02:42:10Z)
                StateRoot: default
          LayeredPackages: 1password glfw prismlauncher waydroid

Output of groups

~ 
❯ groups
pmc wheel docker

Extra information or context

Rebasing to the non-NVIDIA image works just fine, just without NVIDIA proprietary hardware acceleration support

m2Giles commented 3 months ago

Were you able to switch to another virtual terminal?

Right now this sounds like gdm failed to start properly if a gray screen appeared.

piperswe commented 3 months ago

Ctrl+Alt+F1-12 had no effect whatsoever.

m2Giles commented 3 months ago

We made some changes to our mutter package to remove an issue with XWayland. This may have been present with X directly as well.

Is this behavior still occuring?

piperswe commented 3 months ago

I actually traded in my NVIDIA GPU for an AMD one a little bit after filing this ticket so I wouldn't have to deal with NVIDIA driver shenanigans anymore... sorry I can't check to see if the issue is still there anymore

castrojo commented 3 months ago

Welcome to the promised land!

tasgon commented 2 months ago

I'm still seeing this issue on an RTX 3080.

rpm-ostree output

``` ❯ rpm-ostree status -v State: idle AutomaticUpdates: stage; rpm-ostreed-automatic.timer: no runs since boot Deployments: ● ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:latest (index: 0) Digest: sha256:a1adf996664c0a79d07b0611b9b45e81ceb1be8c02056ea355054a572e24fa7b Version: 40.20240622.0 (2024-06-23T01:13:53Z) BaseCommit: 29cf678691c97729d9e58da5739ec9176dbdc0cdbc448d08bb34c503e036a926 Commit: 3ccba737be0e287ea65c6a20834501e0f53ff0811f3393ca0defc2928824e68d ├─ copr:copr.fedorainfracloud.org:hikariknight:looking-glass-kvmfr (1970-01-01T00:00:00Z) ├─ fedora (1970-01-01T00:00:00Z) ├─ updates (1970-01-01T00:00:00Z) └─ updates-archive (1970-01-01T00:00:00Z) Staged: no StateRoot: default Initramfs: '"-I /etc/crypttab /usr/lib/modprobe.d/nvidia.conf"' ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:latest (index: 1) Digest: sha256:326e4a33d701be28ab461ee689cbda54bd02f3b9ab348a53ace08978c8fcb205 Version: 40.20240620.0 (2024-06-21T16:54:44Z) BaseCommit: f573ecf49af489cc81724c6c92237f6c596cffc406427a2f4e28a4f86d23142a Commit: 95f483fc9f6c500d1e7a0d97ecdd249227d80bf208282c7a565fd8de331a0668 ├─ copr:copr.fedorainfracloud.org:hikariknight:looking-glass-kvmfr (1970-01-01T00:00:00Z) ├─ fedora (1970-01-01T00:00:00Z) ├─ updates (1970-01-01T00:00:00Z) └─ updates-archive (1970-01-01T00:00:00Z) StateRoot: default Initramfs: '"-I /etc/crypttab /usr/lib/modprobe.d/nvidia.conf"' ```

Occasionally, I am able to boot without issue, though.

m2Giles commented 2 months ago

Your local initramfs is likely not needed.

The Nvidia image has the drivers already loaded into the initramfs.

You also have input of today's image. Are you able to VT switch? If not, but you are able to ssh into the machine, you may be affected by Nvidia's atomic flip issue.

parkan commented 1 month ago

@m2Giles I am seeing this problem as well with

 ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:gts
                   Digest: sha256:92ecae19614229f0a398519c50c8c73ed1d87e8c42c4973368a7063b026fd8c7
                   Version: 39.20240811.0 (2024-08-11T05:55:49Z)

but not with rollback on

 ● ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:gts
                   Digest: sha256:19d05e73414b3b694bf62487149390934783f3f7e443099a3faabe6edd7bdaa7
                  Version: 39.20240804.0 (2024-08-04T05:54:01Z)

any hints?

m2Giles commented 1 month ago

Are you able to VT switch?

parkan commented 1 month ago

no, I'm not able to switch to a tty, however I can make changes to the system by booting into the older image

parkan commented 4 weeks ago

note: I ended up buying an AMD GPU and rebasing to the normal bluefin-dx image, which resolved the problem

piperswe commented 4 weeks ago

note: I ended up buying an AMD GPU and rebasing to the normal bluefin-dx image, which resolved the problem

that appears to be the best solution!