vetzki / nvidia-prime-switch-sddm

MIT License
10 stars 2 forks source link

Config switches, but glxinfo shows that it's still using intel GPU #1

Closed leledumbo closed 6 years ago

leledumbo commented 6 years ago

At first I thought I have to wait until GTX 1050 is well supported in bumblebee + Manjaro, but then "it just works" in PearlLinux using PRIME. Switching is hell easy and no config mess necessary. Then I saw your thread in Manjaro forum and eventually ended up here with problems as written in the title.

The main problem with this laptop is that "it seems" whenever bbswitch tries to turn the Nvidia card off, the whole system freezes / hangs. So the current "solution" (at least I can boot and use my desktop even if only using intel) is to make a systemd unit executing:

echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove

before bbswitch does its job.

It works, but that means my GTX 1050 is left useless. I want to use this GPU for heavier gaming (those that UHD 630 can't quite cut it). Using your package, 90-mhwd.conf switches nicely between intel and nvidia, I've also checked the PCI number is correct for both cards, but I still ended up using Intel no matter what due to this:

nvidia-nvlink: Nvlink Core is being initialized, major device number 237
NVRM: The NVIDIA GPU 0000:01:00.0
NVRM: (PCI ID: 10de:1c8d) installed in this system has
NVRM: fallen off the bus and is not responding to commands.
nvidia: probe of 0000:01:00.0 failed with error -1
NVRM: The NVIDIA probe routine failed for 1 device(s).
NVRM: None of the NVIDIA graphics adapters were initialized!
nvidia-nvlink: Unregistered the Nvlink Core, major device number 237

2 solutions I've found on the net involves rcutree.rcu_idle_gp_delay=1 and pcie_port_pm=off kernel parametrs, both don't help at all.

1 weird thing, though: after blacklisting bbswitch and even after uninstalling respective kernel package and modprobe cannot find it, I actually find that bbswitch is still loaded out of nowhere on start. I can rmmod it, but immediate modprobe won't find it. So I wonder where this module is loaded from.

The usual system information:

$ inxi -Fxxz
System:    Host: Lelesus Kernel: 4.17.5-1-MANJARO x86_64 bits: 64 compiler: gcc v: 8.1.1 
           Desktop: KDE Plasma 5.13.2 tk: Qt 5.11.1 dm: lightdm,sddm Distro: Manjaro Linux 17.1.11 Hakoila 
Machine:   Type: Laptop System: ASUSTeK product: GL503VD v: 1.0 serial: <filter> 
           Mobo: ASUSTeK model: GL503VD v: 1.0 serial: <filter> UEFI: American Megatrends v: GL503VD.305 
           date: 10/16/2017 
Battery:   ID-1: BAT1 charge: 62.1 Wh condition: 62.1/64.4 Wh (96%) volts: 5.4/15.2 model: ASUS A32-K55 
           serial: <filter> status: Full 
CPU:       Topology: Quad Core model: Intel Core i7-7700HQ bits: 64 type: MT MCP arch: Skylake rev: 9 
           L2 cache: 6144 KiB 
           flags: lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 44944 
           Speed: 3400 MHz min/max: 800/3800 MHz Core speeds (MHz): 1: 3400 2: 3400 3: 3400 4: 3400 5: 3400 
           6: 3400 7: 3400 8: 3400 
Graphics:  Card-1: Intel driver: i915 v: kernel bus ID: 00:02.0 chip ID: 8086:591b 
           Display: x11 server: X.Org 1.19.6 driver: intel unloaded: modesetting,nvidia 
           resolution: 1920x1080~60Hz 
           OpenGL: renderer: Mesa DRI Intel HD Graphics 630 (Kaby Lake GT2) v: 4.5 Mesa 18.1.3 compat-v: 3.0 
           direct render: Yes 
Audio:     Card-1: Intel CM238 HD Audio driver: snd_hda_intel v: kernel bus ID: 00:1f.3 chip ID: 8086:a171 
           Sound Server: ALSA v: k4.17.5-1-MANJARO 
Network:   Card-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet driver: r8168 v: 8.045.08-NAPI 
           port: d000 bus ID: 02:00.0 chip ID: 10ec:8168 
           IF: eth0 state: down mac: <filter> 
           Card-2: Intel Wireless 8265 / 8275 driver: iwlwifi v: kernel bus ID: 04:00.0 chip ID: 8086:24fd 
           IF: wlan0 state: up mac: <filter> 
           Card-3: IMC Networks type: USB driver: uvcvideo bus ID: 1:4 chip ID: 13d3:5666 
Drives:    HDD Total Size: 1.14 TiB used: 945.20 GiB (80.8%) 
           ID-1: /dev/nvme0n1 vendor: Samsung model: MZVLW256HEHP-000L2 size: 238.47 GiB speed: 31.6 Gb/s 
           lanes: 4 serial: <filter> temp: 50 C 
           ID-2: /dev/sda vendor: Seagate model: ST1000LX015-1U7172 size: 931.51 GiB speed: 6.0 Gb/s 
           serial: <filter> temp: 44 C 
Partition: ID-1: / size: 160.00 GiB used: 130.23 GiB (81.4%) fs: btrfs dev: /dev/nvme0n1p3 
           ID-2: swap-1 size: 8.00 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/nvme0n1p2 
Sensors:   System Temperatures: cpu: 64.0 C mobo: N/A 
           Fan Speeds (RPM): cpu: 0 
Info:      Processes: 366 Uptime: 4m Memory: 15.55 GiB used: 2.23 GiB (14.3%) Init: systemd v: 239 Compilers: 
           gcc: 8.1.1 clang: 6.0.0 Shell: bash v: 4.4.23 running in: terminator inxi: 3.0.12
$ cat /etc/modprobe.d/bumblebee.conf 
remove nvidia modprobe -r --ignore-remove nvidia-drm nvidia-modeset nvidia-uvm nvidia
$ cat /etc/modprobe.d/mhwd-bbswitch.conf
##
## Generated by mhwd - Manjaro Hardware Detection
##

options bbswitch load_state=0 unload_state=0
$ cat /etc/modprobe.d/mhwd-gpu.conf 
blacklist nouveau
blacklist ttm
options nvidia-drm modeset=1
$ cat /etc/modprobe.d/mhwd-nvidia.conf 
##
## Generated by mhwd - Manjaro Hardware Detection
##

blacklist nouveau
blacklist nvidia
blacklist nvidia-modeset
blacklist nvidia-uvm
blacklist ttm
blacklist drm_kms_helper
blacklist drm
leledumbo commented 6 years ago

As a bonus, this is what happens when I try to rescan hoping the GPU will work:

img_20180720_120141

i.e.: it's freezing right there so I have to take screenshot using my phone.

leledumbo commented 6 years ago

Probably a kernel version issue with 4.17, will retry with 4.15 as another approach works.

leledumbo commented 6 years ago

Confirmed working in 4.15, I have to use the more complete nvidia.conf, though, as the one you provided makes me unable to use the GPU for, for instance, glxspheres64. Here's the content:

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 396.24  (buildmeister@swio-display-x64-rhel04-13)  Thu Apr 26 01:13:52 PDT 2018

Section "Module"
    Load "modesetting"
EndSection

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       30.0 - 83.0
    VertRefresh     56.0 - 75.0
    Option         "DPMS"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:1:0:0"
    Option         "AllowEmptyInitialConfiguration"
    Option         "NoLogo" "1"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    SubSection     "Display"
    Depth       24
    EndSubSection
EndSection

Section "Extensions"
    Option         "Composite" "Enable"
EndSection

Section "InputClass"
    Identifier          "Keyboard Defaults"
    MatchIsKeyboard        "yes"
    Option              "XkbOptions" "terminate:ctrl_alt_bksp"
EndSection
vetzki commented 6 years ago

do you still have bumblebee stuff installed ? (the bbswitch stuff indicates this, respectivly bbswitch is a seperate package). Also the /etc/modprobe.d/mhwd-nvidia.conf(*) file should not exist. I will update readme and state that bumblebee stuff should be uninstalled to avoid problems.

(*) /etc/modprobe.d/ should only contain one gpu related file also /etc/modules-load.d/

as for the xorg conifg: http://us.download.nvidia.com/XFree86/Linux-x86/358.16/README/randr14.html has as example the format I use (for newer x server versions), however in http://us.download.nvidia.com/XFree86/Linux-x86/390.67/README/randr14.html its gone. I will take a futher look into this issue.

best regards,

leledumbo commented 6 years ago

My actual problem was due to bbswitch in mkinitcpio.conf, which embeds the module in the kernel image despite the module no longer exists (that's why blacklisting has no effect) and was used to built the kernel image previously.

I have deleted /etc/modprobe.d/mhwd-nvidia.conf, now I only have these:

$ cat /etc/modprobe.d/mhwd-gpu.conf 
blacklist nouveau
blacklist ttm
options nvidia-drm modeset=1
$ cat /etc/modules-load.d/mhwd-gpu.conf
nvidia
nvidia-modeset
nvidia-drm
nvidia-uvm

There's a problem with my monitor in which Nvidia driver fails to get the correct HorizSync and VertRefresh (I ended up with 960x540 resolution only), so I need above modification to get 1920x1080. No idea which section enables the GPU to use for me. All I know is with the barebone config, glxinfo doesn't return Nvidia as vendor string, but error message instead (but I forget what it is and kinda lazy to repro).