ublue-os / bazzite

Bazzite is a cloud native image built upon Fedora Atomic Desktops that brings the best of Linux gaming to all of your devices - including your favorite handheld.
https://bazzite.gg
Apache License 2.0
4.15k stars 253 forks source link

HTPC (bazzite-deck) amdgpu stuck at 15W (with workaround) #320

Closed HikariKnight closed 1 year ago

HikariKnight commented 1 year ago

so i recently moved my old htpc/guestPC build from chimeraOS to bazzite deck as i wanted something that mimicks the steamdeck more closely in terms of desktop ui as my friends understand kde better than gnome (and i was getting fed up with fixing extensions between each update with gnome to make it "friend friendly")

However when i started up some games i noticed the gpu usage was at 100% and i found that odd as these games ran perfectly fine before changing over. I then enabled the performance overlay to display in the gamescope session over steam too and it was fluctuating between 30% and 100%

I then went into desktop mode and installed corectrl and noticed the gpu clock was stuck at 300Mhz (the lowest for the rx560 8gb in the powerplay table), even in games. I then decided to try and just see if i could change it by setting the performance mode to advanced and setting the clock speed there, it worked! the games ran at full speed again but only in desktop mode after opening corectrl each time and switching modes. I then decided to set the performance mode back to automatic and the clock speed was no longer locked to 300Mhz? in fact it went to 300Mhz on idle and adjusted accordingly to what i was doing on the desktop now, odd.

~Then i decided to see if i could set the higher powerplay state at boot just for testing in gamescope, however i was unable to do so in /sys/class/drm/card1/device/ as the settings i applied in there just seemed to get ignored even then corectrl was not running? NOTE: card0 does not exist for me but it would most likely be the intel igpu if it was enabled. Doing echo "1 2 3 4 5 6 7" | sudo tee /sys/class/drm/card1/device/pp_dpm_sclk which would disable powerplay state 0 but would just yield no change and spit out the error invalid argument, same goes for even trying to set a single powerplay state.~ ~This was tried with and without the amdgpu.ppfeaturemask kernel argument, even though corectrl managed to "fix" the issue temporarily without it.~

~The issue does not seem related to #176 as the gpu frequency (SCLK) is just stuck, the memory clock (MCLK) is however permanently stuck at 1750Mhz which might be related, but doesnt affect the performance since it is stuck at a high clock at least, but my testing with corectrl at least shows that there is something that might be able to be changed to fix the stuck gpu frequency at least, and i hope the info here will at least lead some devs the right way or help someone find workaround that also works in the gamescope session.~

After a night of sleep i found a workaround posted in the follow up post below as i noticed the power cap for the card got set to 15W by the gamescope session at specific times.

Hardware: CPU: Intel i5-4460 GPU: AMD RX 560 8GB Memory: 16GB Monitor: An old Toshiba 1080p LED TV (60Hz)

HikariKnight commented 1 year ago

After some more digging today i found out that the power1_cap had been set to 15W and it seems like the gamescope sessions sets this when it:

i made a workaround for the performance issues by setting the power cap back to default using a systemd service timer and a script. This is not a fix, but a dirty workaround for other people who are affected until the proper cause can be found and fixed.

/usr/local/bin/amdgpu-fix-tdp

#!/bin/bash
# shellcheck disable=SC2094
# CONFIG
# default or max
POWER_CAP_MODE=default

# PATHS
POWER_CAP_PATH=$(find /sys/class/hwmon/*/ -name "power1_cap")

# Read the power1_cap and see if it is set to 15W
while IFS= read -r line
do
  # Check if we have a power cap of 15W
  if [[ $line -eq 15000000 ]]
  then
    # Compare the current power cap to what is default for the card
    diff "${POWER_CAP_PATH}" "${POWER_CAP_PATH}_${POWER_CAP_MODE}"
    if [ $? -eq 1 ]
    then
      # If the power cap is different from the default or max, fix it
      echo Card stuck in 15W mode, fixing
      cat "${POWER_CAP_PATH}_${POWER_CAP_MODE}" > "${POWER_CAP_PATH}"
    fi
  fi
done < "${POWER_CAP_PATH}"
exit

/etc/systemd/system/amdgpu-fix-tdp.service

[Unit]
Description=Set amdgpu TDP to default instead of 15W
After=multi-user.target rc-local.service systemd-user-sessions.service
Wants=modprobe@amdgpu.service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/amdgpu-fix-tdp
ExecReload=/usr/local/bin/amdgpu-fix-tdp

[Install]
WantedBy=multi-user.target

/etc/systemd/system/amdgpu-fix-tdp.timer

[Unit]
Description=Timer to check and fix amdgpu tdp

[Timer]
OnBootSec=30
OnUnitActiveSec=3s
AccuracySec=1us
Unit=amdgpu-fix-tdp.service

[Install]
WantedBy=multi-user.target

Once you have made these files just run

sudo chmod +x /usr/local/bin/amdgpu-fix-tdp
sudo systemctl daemon-reload
sudo systemctl enable amdgpu-fix-tdp.timer
sudo systemctl enable amdgpu-fix-tdp.service
sudo systemctl start amdgpu-fix-tdp.timer

Every 3 seconds after boot, the timer will trigger where the script will check if the power cap has been reset to 15W and if it has, it will put it back to default or max. This should not cause stuttering in games as the power cap only seems to reset to 15W whenever a game is started or stopped in the gamescope session.

KyleGospo commented 1 year ago

i made a workaround for the performance issues by setting the power cap back to default using a systemd service timer and a script. This is not a fix, but a dirty workaround for other people who are affected until the proper cause can be found and fixed.

Thank you for this! You can also remove the ability for Steam to set TDP by removing desired lines from this file: /usr/lib/udev/rules.d/30-steamdeck.rules

Or you can use one of a couple different plugins in Decky Loader to set a custom TDP for hardware other than the deck.

It may be possible to automate this step, just need to find the best way to go around it, as on some devices the built in TDP limiter is still desirable. The ASUS Ally for instance can make use of it with an increased cap.

massatt212 commented 1 year ago

https://github.com/HoloISO/holoiso_install_main/commit/cf139a129e58a2c57c27d6c0fad68daba8a71984

I cannot edit or delete that file, I don't have permission, but that files cause the low tdp

steamos-priv-write

HikariKnight commented 1 year ago

i made a workaround for the performance issues by setting the power cap back to default using a systemd service timer and a script. This is not a fix, but a dirty workaround for other people who are affected until the proper cause can be found and fixed.

Thank you for this! You can also remove the ability for Steam to set TDP by removing desired lines from this file: /usr/lib/udev/rules.d/30-steamdeck.rules

Or you can use one of a couple different plugins in Decky Loader to set a custom TDP for hardware other than the deck.

It may be possible to automate this step, just need to find the best way to go around it, as on some devices the built in TDP limiter is still desirable. The ASUS Ally for instance can make use of it with an increased cap.

it was one of the first things i tried and it didnt help when i tried but i will try again just to double check PS: also i updated the script to find the power1_cap path regardless of which card it is, have not tested with multiple gpus as i dont run 2 gpus on the host system of my main pc

HikariKnight commented 1 year ago

hi @KyleGospo The file in question is in /etc/udev/rules.d removing /etc/udev/rules.d/30-steamdeck.rules did not fix it, i rebooted and the TDP is still set to 15W

I ran my script manually to reset it whenever it got set to 15W to see when it got set back to 15W next

So there must be something else that meddles with the TDP at those times

massatt212 commented 1 year ago

It doesn't exist in my install, I check already

HikariKnight commented 1 year ago

Or you can use one of a couple different plugins in Decky Loader to set a custom TDP for hardware other than the deck.

It may be possible to automate this step, just need to find the best way to go around it, as on some devices the built in TDP limiter is still desirable. The ASUS Ally for instance can make use of it with an increased cap.

Never got PowerControl working as it failed to load in decky, i dont know about any other non deck specific one

It doesn't exist in my install, I check already

@massatt212 the file is in /etc/udev/rules.d/ however it does not fix the stuck 15W TDP. currently the timer script i made is what can let you work around it for now.

massatt212 commented 1 year ago

I tried that also, even changing card to 0 or 1 didn't work, I'm back on nobara, I'll wait till it's fixed, I like the lmv2 grouped partition so I'll wait.

massatt212 commented 1 year ago

I think you guys should allow people to edit every file so everything will be open, at there own risk, cause the holo and chimera fixed the problem a while back.

KyleGospo commented 1 year ago

I think you guys should allow people to edit every file so everything will be open, at there own risk, cause the holo and chimera fixed the problem a while back.

This is an Immutable operating system, that will never happen. It's a feature that you can't. Looking into an automatic fix for this

KyleGospo commented 1 year ago

https://github.com/ublue-os/bazzite/commit/484fa8011fc2b0067fa09af2d813312ef574a93b

This is building now, give it a try once it's done. Should resolve your issues.

HikariKnight commented 1 year ago

Awesome will try it out when its available

HikariKnight commented 1 year ago

484fa80

This is building now, give it a try once it's done. Should resolve your issues.

hey @KyleGospo Can confirm this fixes it, plus now a toggle could be put in yafti if necessary so people installing bazzite on handhelds can toggle the setting back on easily :smile:

have a good night

massatt212 commented 1 year ago

I'll swap back if the tdp problem is fixed.

massatt212 commented 11 months ago

my rx 6400 is stuck at 15w again, its been like this for months, i been using chimeraos and decided to come back and its the same problem.

HikariKnight commented 11 months ago

my rx 6400 is stuck at 15w again, its been like this for months, i been using chimeraos and decided to come back and its the same problem.

is it still stuck at 15W if you restart steam, as sometimes when i boot my htpc it is very laggy but restarting steam fixes it (steam --> power --> restart steam) it is something i will be looking into to see what causes it as it has happened recently, just want to see if your problem is the same.

massatt212 commented 10 months ago

my rx 6400 is stuck at 15w again, its been like this for months, i been using chimeraos and decided to come back and its the same problem.

is it still stuck at 15W if you restart steam, as sometimes when i boot my htpc it is very laggy but restarting steam fixes it (steam --> power --> restart steam) it is something i will be looking into to see what causes it as it has happened recently, just want to see if your problem is the same.

i got it working, but can you make a script of this fix, it would be so much easier to press 1 button.

massatt212 commented 9 months ago

is the 15w problem fixed? or i still have to apply the above fix to use bazzite-steamdeck on my desktop?

HikariKnight commented 8 months ago

is the 15w problem fixed? or i still have to apply the above fix to use bazzite-steamdeck on my desktop?

We found something that caused this to appear on a handful of gpus and managed to get realtime testing done yesterday. Newest update as of today has a patch active that should auto fix the issue at each boot if you are affected by this specific issue. In short, some random gpus (not limited to model or manufacturer) just randomly booted with their power1_cap writable at boot, which caused steam to directly set it to 15W. it was fixed by #892

Decker-01 commented 8 months ago

I still have these issues with my htpc. Got a sapphire rx 7900 gre. The tdp seems to be locked at ~30 watts. Yor script fixes the issue, but i cant create the services, because i don‘t have permissions for the folders.

i‘m on the stable channel. Do i need to switch to the latest-channel of bazzite? Or does this only refer to the steamOS version?

HikariKnight commented 7 months ago

stable and latest are the same channel. do you have any other decky plugins that are interfering with the TDP (ex: simpledeckytdp)? the current fix only triggers if TDP is set to 15W when it shouldnt be, if there are other plugins that tampers with that and changes it to 30W then the script will think that the device is a handheld or laptop and not do anything.