Closed jonahbron closed 4 years ago
I just ran install.sh today and got this error: ...
@jonahbron Do you see this error when you run the script or after you run it?
During. Here's an image.
I managed to get my system back up by uninstalling the NVidia driver and rolling back to kernel lts2018 (instead of the current lts2019).
I think this may be an issue with the driver itself (440.82). I think that may be the case because I've tried installing with several different kernel versions, but none of them work. I might try to install an older driver version.
Tried with a different version of the NVidia driver (435.21). No change in behavior, still unable to load nvidia-drm.
I thought a full system wipe might fix it. I reinstalled Clear Linux from scratch with the lts kernel. I'm still getting the same error. Tried with both the native and lts versions.
I thought a full system wipe might fix it. I reinstalled Clear Linux from scratch with the lts kernel. I'm still getting the same error. Tried with both the native and lts versions. ...
@jonahbron I can reproduce this error.
nvidia-drm is a kernel module required by NVIDIA proprietary driver. During installation, the installer also add a Xorg config file to load this module.
It's confusing to me that the build log of dkms module says it succeeded and then the next line is an error of missing nvidia-drm.
i get same error - tried different kernel versions - and none is working, any ideas how to fix this ?
Relieved to hear it's not just me 😄 . I posted about this on the NVidia forums, so watch this thread.
If a solution is found, I'll update this issue so we can patch the installer script.
My only idea is that I think Clear Linux recently update the GCC version. I wonder if that could be related.
Also created an issue for Clear Linux directly.
rolling back version fixed issue for me
swupd verify --fix --picky -m 32990
not sure if it is safe to do it that way, but drivers are working :)
rolling back version fixed issue for me ...
@FanFani4 Can you post this here https://github.com/clearlinux/distribution/issues/1994
I can afford to wait, so I won't roll back yet. Hopefully we can use my machine to find the cause.
Could be because of gcc 10. Someone was saying the problem disappears with gcc 10 test version https://aur.archlinux.org/packages/nvidia-390xx-dkms/
After two days trying to solve this, I've finally managed to install 440.82 with the latest kernel-native (custom built with SECTION_MISMATCH_WARN_ONLY). Perhaps nvidia or intel staff will clarify what's at fault here, but there is some problematic interaction between 5.6.15 kernel vs GCC 10.1 vs nvidia 440.92 installer in 1) creating the proper dkms build and source tree in "/var/lib/dkms/nvidia/440.82/" and 2) issuing the proper install command to the /usr/bin/dkms tool.
Using the following installation command (per https://docs.01.org/clearlinux/latest/tutorials/nvidia.html plus "--no-cc-version-check" just as a guarantee and "--expert" instead of "--silent" for a more verbose installation)
sudo ./NVIDIA-Linux-x86_64-440.82.run \ Â Â --utility-prefix=/opt/nvidia \ Â Â --opengl-prefix=/opt/nvidia \ Â Â --compat32-prefix=/opt/nvidia \ Â Â --compat32-libdir=lib32 \ Â Â --x-prefix=/opt/nvidia \ Â Â --x-module-path=/opt/nvidia/lib64/xorg/modules \ Â Â --x-library-path=/opt/nvidia/lib64 \ Â Â --x-sysconfig-path=/etc/X11/xorg.conf.d \ Â Â --documentation-prefix=/opt/nvidia \ Â Â --application-profile-path=/etc/nvidia/nvidia-application-profiles-rc.d \ Â Â --no-precompiled-interface \ Â Â --no-distro-scripts \ Â Â --force-libglx-indirect \ Â Â --glvnd-egl-config-path=/etc/glvnd/egl_vendor.d \ Â Â --egl-external-platform-config-path=/etc/egl/egl_external_platform.d \ Â Â --dkms \ Â Â --no-cc-version-check \ Â Â --expert
Using the "--expert" options reveals why the installer issues "ERROR: Unable to load the 'nvidia-drm' kernel module" without any explanation at all:
-> Driver file installation is complete. -> Installing DKMS kernel module: -> done. ERROR: Unable to load the 'nvidia-drm' kernel module: 'modprobe: ERROR: ctx=0x5646828152a0 path=/lib/modules/5.6.15-957.native/kernel/drivers/video/nvidia-modeset.ko error=No such file or directory modprobe: ERROR: ctx=0x5646828152a0 path=/lib/modules/5.6.15-957.native/kernel/drivers/video/nvidia-modeset.ko error=No such file or directory modprobe: ERROR: could not insert 'nvidia_drm': Unknown symbol in module, or unknown parameter (see dmesg)'
The DKMS kernel modules are being built, but they are not being finally installed from the proper build directory to the corresponding kernel modules path. That is why they are not being loaded by modprobe. ls /var/lib/dkms/nvidia/440.82/ reveals the following:
source/ build/
The correct source symlink directory pointing to the nvidia kernel modules sources. And the build symlink directory which after the successful build should hold the resulting binaries, make.log, etc. And that is not happening. build/ is an empty directory. The solution is to manually build and install the dkms nvidia source tree. Here is the fix, after rebuilding the kernel with "SECTION_MISMATCH_WARN_ONLY". Start by installing the driver again, this time with the --silent flag instead of the --expert flag:
sudo ./NVIDIA-Linux-x86_64-440.82.run \ Â Â --utility-prefix=/opt/nvidia \ Â Â --opengl-prefix=/opt/nvidia \ Â Â --compat32-prefix=/opt/nvidia \ Â Â --compat32-libdir=lib32 \ Â Â --x-prefix=/opt/nvidia \ Â Â --x-module-path=/opt/nvidia/lib64/xorg/modules \ Â Â --x-library-path=/opt/nvidia/lib64 \ Â Â --x-sysconfig-path=/etc/X11/xorg.conf.d \ Â Â --documentation-prefix=/opt/nvidia \ Â Â --application-profile-path=/etc/nvidia/nvidia-application-profiles-rc.d \ Â Â --no-precompiled-interface \ Â Â --no-distro-scripts \ Â Â --force-libglx-indirect \ Â Â --glvnd-egl-config-path=/etc/glvnd/egl_vendor.d \ Â Â --egl-external-platform-config-path=/etc/egl/egl_external_platform.d \ Â Â --dkms \ Â Â --no-cc-version-check \ Â Â --silent
Go to the same /var/lib/dkms/nvidia/440.82/ directory. Enter source/, where dkms.conf and Makefile is. Try this:
sudo dkms autoinstall
All nvidia modules will be successfully build, added and installed. ls /var/lib/dkms/nvidia/440.82/ will now correctly show a proper dkms compilation tree: source/ 5.6.15-957.native/
The kernel should be custom rebuild with SECTION_MISMATCH_WARN_ONLY (Kernel hacking -> Compile-time checks and compiler options -> Make section mismatch errors non-fatal), following the simple guide: https://docs.01.org/clearlinux/latest/guides/kernel/kernel-development.html. Remember to install the *-dev package of your newly custom built kernel as well: rpm2cpio linux-dev-5.6.15-957.x86_64.rpm | (cd /; sudo cpio -i -d -u -v);
I found @SPAstef's solution to be fairly straightforward. It has the drawback of not working with DKMS however, so I'll have to manually update the kernel and reinstall the driver periodically.
Here's the exact diff I used on the install.sh
file.
--- a/NVIDIA-Driver/install.bash
+++ b/NVIDIA-Driver/install.bash
@@ -64,7 +64,8 @@ echo -e "\e[32m The version of the driver is \e[33m""$([[ "$INSTALLER" =~ ^.*\-(
echo "${BASH_REMATCH[1]}")\e[m"
read -rp "Press any key to continue ... " -n1 -s
echo
-if ! sudo sh "$INSTALLER" \
+export CONFIG_SECTION_MISMATCH_WARN_ONLY=y
+if ! sh "$INSTALLER" \
--utility-prefix=/opt/nvidia \
--opengl-prefix=/opt/nvidia \
--compat32-prefix=/opt/nvidia \
@@ -81,8 +82,8 @@ if ! sudo sh "$INSTALLER" \
--force-libglx-indirect \
--glvnd-egl-config-path=/etc/glvnd/egl_vendor.d \
--egl-external-platform-config-path=/etc/egl/egl_external_platform.d \
- --dkms \
--silent; then
echo -e "\e[31m Installation failed! Aborting...\e[m"
exit 1
fi
And ran the install.bash
script as root.
I definitely tried several combinations to get it working with DKMS. My guess is that DKMS is compiling the modules itself and the env var isn't making it through to that point. Hopefully CL or NVidia can fix this soon.
I had the same issue with DKMS, it completely ignores any environment variable. Its purpose should be to avoid reinstalling drivers after updates, but it seems that we need to reinstall them anyway so not a big deal
Someone on the NVidia forum posted a possible script solution to get it working with DKMS again.
I'm going to try this later.
I think an interim script-based solution may still be possible for this. According to someone on the NVidia thread, we may be able to set the correct ENV for DKMS using this method:
However I ran into a roadblock because I don't know how to use the PRE_BUILD config key properly. My attempt had no behavior change.
Not sure, if the today's update was the reason(last days wasn't working), but I fixed the issue now with the simple follow change:
index d29559d..1cc5ce3 100755
--- a/NVIDIA-Driver/install.bash
+++ b/NVIDIA-Driver/install.bash
@@ -64,7 +64,7 @@ echo -e "\e[32m The version of the driver is \e[33m""$([[ "$INSTALLER" =~ ^.*\-(
echo "${BASH_REMATCH[1]}")\e[m"
read -rp "Press any key to continue ... " -n1 -s
echo
-if ! sudo sh "$INSTALLER" \
+if ! sudo CONFIG_SECTION_MISMATCH_WARN_ONLY=y "$INSTALLER" \
--utility-prefix=/opt/nvidia \
--opengl-prefix=/opt/nvidia \
--compat32-prefix=/opt/nvidia \
Otherwise run once with --dkms
and once without... was also working.
You can check with (also loaded when DKMS-Failed message appears 😯)
$ sudo dkms status
Passwort:
nvidia, 440.100, 5.7.8-968.native, x86_64: installed
if the module was loaded.
Mein Laptop: Tuxedo Computers VGA: NVIDIA Corporation GP106BM [GeForce GTX 1060 Mobile 6GB] (rev a1) Bios Setup: DISCRETE-Mode (turned off the Intel Graphics by bios) Clear Linux OS; Build-ID: 33510
@enbock Glad it worked for you. I tried to install again with DKMS, but now I'm getting a compiler version mismatch. Kernel was compiled with 10.1.1, but swupd has installed 10.2.1. I filed a bug with CL.
@enbock Glad it worked for you. I tried to install again with DKMS, but now I'm getting a compiler version mismatch. Kernel was compiled with 10.1.1, but swupd has installed 10.2.1. I filed a bug with CL. ...
@jonahbron This is not a bug. When the new kernel is released this will be fixed automatically.
@lebensterben I see. Any idea when that will be? Seems bad to have windows of time in which the compiler versions don't match. That means any user might come along and simply not be able to install the NVidia drivers until the kernel gets an update.
This is a well known issue. I don't think you need to wait for long. You can also disable the GCC mismatch with an installer option.
Can confirm, after an update I was able to install with DKMS. Only needed CONFIG_SECTION_MISMATCH_WARN_ONLY
set.
Can confirm, after an update I was able to install with DKMS. Only needed
CONFIG_SECTION_MISMATCH_WARN_ONLY
set.
@jonahbron Alternatively, you can append "--no-cc-version-check" to installer option.
Driver version: 450.57 Kernel version: 5.7.13-975.native
FYI: If someone have in last days that problem back, download the newest SLB(short lived version) drivers from nvidia. (in my case this one https://download.nvidia.com/XFree86/Linux-x86_64/455.28/NVIDIA-Linux-x86_64-455.28.run)
@enbock Thanks. I heard that NVIDIA announced that 5.9 kernel is incompatible and it recommends users to defer to upgrade the kernel until a new NVIDIA driver is released. But luckily it worked fine for me.
I just ran install.sh today and got this error:
Kernel version is 5.4.42-40.lts2019, Clear Linux system version is 33180.
What could cause this?