zeule / asus-ec-sensors

Linux HWMON sensors driver for ASUS motherboards to get sensor readings from the embedded controller
GNU General Public License v2.0
45 stars 21 forks source link

Now getting race condition warning #43

Closed KeithMyers closed 6 months ago

KeithMyers commented 1 year ago

I never had this warning from dmesg before.

[14364.568530] hwmon1: Concurrent access to the ACPI EC detected. Race condition possible.

Started upon installing kernel 6.4.2 from the Ubuntu Mainline PPA

asus-ec-sensors loaded along with k10temp in the kernel. I don't remember k10temp being loaded on the 6.4.0 kernel.

zeule commented 1 year ago

The warning means there is another actor who uses the EC registers, which might be the board firmware or another kernel module. Perhaps you updated BIOS recently? Installed new hardware? In any case, to help you I need to know the board model and the list of loaded kernel modules. Maybe I ask for the ACPI dump as well.

KeithMyers commented 1 year ago

No same BIOS as before. No new hardware. Upgraded from 6.4.0 to 6.4.2 kernel. Now k10temp is loaded by the kernel. Competing cpu temps now. Asus X670E Crosshair Hero is the board. I've sent you the DSDT output already.

zeule commented 1 year ago

Thank you! In order to simplify the task of checking kernel logs and sources, please provide me the list of loaded modules to narrow the search. Also, how frequently the warning message appears?

KeithMyers commented 1 year ago

Here is lsmod output:

keith@Pipsqueek:~$ lsmod Module Size Used by tls 143360 8 nvidia_uvm 1765376 8 nvidia_drm 90112 11 nvidia_modeset 1314816 10 nvidia_drm intel_rapl_msr 16384 0 snd_hda_codec_hdmi 94208 2 intel_rapl_common 36864 1 intel_rapl_msr edac_mce_amd 36864 0 snd_hda_intel 61440 2 snd_intel_dspcfg 32768 1 snd_hda_intel kvm_amd 208896 0 snd_intel_sdw_acpi 16384 1 snd_intel_dspcfg snd_hda_codec 200704 2 snd_hda_codec_hdmi,snd_hda_intel kvm 1347584 1 kvm_amd snd_hda_core 139264 3 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec snd_hwdep 20480 1 snd_hda_codec irqbypass 12288 1 kvm crct10dif_pclmul 12288 1 polyval_clmulni 12288 0 binfmt_misc 24576 1 nvidia 56532992 983 nvidia_uvm,nvidia_modeset polyval_generic 12288 1 polyval_clmulni joydev 32768 0 ghash_clmulni_intel 16384 0 snd_pcm 188416 4 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hda_core sha512_ssse3 49152 0 nls_iso8859_1 12288 1 input_leds 12288 0 aesni_intel 356352 0 snd_seq_midi 20480 0 crypto_simd 16384 1 aesni_intel snd_seq_midi_event 16384 1 snd_seq_midi cryptd 24576 2 crypto_simd,ghash_clmulni_intel snd_rawmidi 53248 1 snd_seq_midi asus_nb_wmi 24576 0 rapl 20480 0 eeepc_wmi 12288 0 asus_wmi 73728 2 asus_nb_wmi,eeepc_wmi ledtrig_audio 12288 1 asus_wmi sparse_keymap 12288 1 asus_wmi platform_profile 12288 1 asus_wmi snd_seq 90112 2 snd_seq_midi,snd_seq_midi_event wmi_bmof 12288 0 k10temp 16384 0 snd_seq_device 16384 3 snd_seq,snd_seq_midi,snd_rawmidi ccp 135168 1 kvm_amd snd_timer 49152 2 snd_seq,snd_pcm drm_kms_helper 258048 1 nvidia_drm snd 131072 13 snd_seq,snd_seq_device,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_hda_codec,snd_timer,snd_pcm,snd_rawmidi syscopyarea 12288 1 drm_kms_helper sysfillrect 16384 1 drm_kms_helper soundcore 16384 1 snd sysimgblt 12288 1 drm_kms_helper sch_fq_codel 24576 5 ucsi_acpi 12288 0 typec_ucsi 53248 1 ucsi_acpi typec 106496 1 typec_ucsi mac_hid 12288 0 nct6775 40960 0 nct6775_core 102400 1 nct6775 hwmon_vid 12288 1 nct6775 asus_ec_sensors 24576 0 msr 12288 0 parport_pc 53248 0 ppdev 24576 0 lp 28672 0 parport 73728 3 parport_pc,lp,ppdev ramoops 36864 0 reed_solomon 24576 1 ramoops drm 708608 15 drm_kms_helper,nvidia,nvidia_drm pstore_blk 16384 0 pstore_zone 32768 1 pstore_blk efi_pstore 12288 0 ip_tables 32768 0 x_tables 61440 1 ip_tables autofs4 57344 2 hid_logitech_hidpp 65536 0 hid_logitech_dj 36864 0 hid_generic 12288 0 usbhid 73728 4 hid_logitech_dj,hid_logitech_hidpp hid 176128 4 usbhid,hid_generic,hid_logitech_dj,hid_logitech_hidpp nvme 57344 2 crc32_pclmul 12288 0 igc 192512 0 ahci 49152 2 i2c_piix4 28672 0 xhci_pci 24576 0 nvme_core 200704 3 nvme libahci 53248 1 ahci xhci_pci_renesas 20480 1 xhci_pci nvme_common 28672 1 nvme_core video 69632 2 asus_wmi,nvidia_modeset wmi 36864 3 video,asus_wmi,wmi_bmof gpio_amdpt 16384 0

The warning is spamming all my logs Screenshot from 2023-07-12 09-29-51 hundreds of times ever minute.

KeithMyers commented 1 year ago

Removing the k10temp module or blacklisting it didn't stop the race condition warning as I expected. So it looks like the installation of the 6.4.2 kernel is what precipitated the warning. I need to revert to the 6.4.0 kernel and check that. I don't know what else would be causing it.

KeithMyers commented 1 year ago

Reverting to the 6.4.0 kernel removes the race condition.

KeithMyers commented 1 year ago

I was mistaken. Still seeing the race condition in the 6.4.0 kernel. So the change in kernel wasn't the issue. I looked through the logs and found that I didn't have the race warning before 7-8-23.

So I looked at what was installed after that date and the change from 6.4.0 to 6.42 kernel was the first thing changed on the 7-8. A red herring I guess. Still have no clue what precipitated the change.

Next thing I will try is complete removal of the asus-ec-sensors module, driver folders and scrub dkms inventory clean of the driver and load the new 6.5.0-rc1 kernel which is supposed to have the very latest asus-ec-sensors code in its bundled driver.

zeule commented 1 year ago

I would try to unload all the otter asus and wmi modules for the test (eeepc_wmi, asus_wmi, asus_nb_wmi).

zeule commented 1 year ago

Here is the list of code location that write to EC registers: https://elixir.bootlin.com/linux/v6.4.2/C/ident/ec_write

KeithMyers commented 1 year ago

Removing any module having wmi or asus in its name does not stop the warning. Couldn't install the 6.5-rc1 kernels because it won't build the nvidia drivers yet. Tried to install the bundled asus-ec-sensors module in the 6.4.3 kernel but it isn't updated far enough yet to include my board.

keith@Pipsqueek:~$ sudo modinfo asus-ec-sensors [sudo] password for keith: filename: /lib/modules/6.4.3-060403-generic/kernel/drivers/hwmon/asus-ec-sensors.ko license: GPL description: HWMON driver for sensors accessible via ACPI EC in ASUS motherboards author: Eugene Shalygin eugene.shalygin@gmail.com srcversion: 25259E458690E388EAC1736 alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGZENITHIIEXTREMEALPHA: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGZENITHIIEXTREME: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXZ690-AGAMINGWIFID4: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXZ390-FGAMING: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXX570-IGAMING: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXX570-FGAMING: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXX570-EGAMINGWIFIII: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXX570-EGAMING: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXB550-IGAMING: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXB550-EGAMING: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGCROSSHAIRVIIIIMPACT: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGMAXIMUSXIHERO(WI-FI): alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGMAXIMUSXIHERO: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGCROSSHAIRVIIIHERO(WI-FI): alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGCROSSHAIRVIIIHERO: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGCROSSHAIRVIIIFORMULA: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGCROSSHAIRVIIIDARKHERO: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnProWSX570-ACE: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnProArtB550-CREATOR: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnProArtX570-CREATORWIFI: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnPRIMEX570-PRO: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnPRIMEX470-PRO: depends:
retpoline: Y intree: Y name: asus_ec_sensors vermagic: 6.4.3-060403-generic SMP preempt mod_unload modversions sig_id: PKCS#7 signer: Build time autogenerated kernel key sig_key: 65:EE:3B:A5:25:AF:F5:DB:16:39:15:64:20:34:12:A2:E0:6C:80:32 sig_hashalgo: sha512 signature: 16:31:56:77:67:CC:E6:DB:9C:6A:DC:EF:FC:42:D0:3D:9E:69:E0:9E: 75:D4:45:F9:C5:F7:98:B8:D3:7E:B9:AC:53:40:74:36:50:8B:2B:FB: E0:C1:59:6C:13:28:17:98:38:CF:37:B9:93:1F:90:66:58:B0:CE:87: 95:FD:68:36:4C:F6:DC:2A:13:58:75:67:01:2B:B6:4A:A6:1D:A7:F4: E6:70:06:E9:E2:29:23:2F:19:06:0E:AB:D7:77:7D:D6:D7:C3:90:28: BA:D6:83:96:2A:EF:93:F9:86:BE:93:60:2A:9C:DA:F2:81:08:A6:62: EE:74:0D:C6:9D:41:3D:4C:A1:07:23:A2:22:98:FC:56:65:D0:22:24: E6:BB:E9:53:57:C3:D2:9B:A1:C2:67:F3:BD:3C:6F:5E:EB:F2:B5:E0: 47:F9:07:43:82:08:06:F8:8E:B6:47:05:2D:FC:04:89:0F:8C:D0:4A: 30:DF:08:89:1D:FE:3F:3F:7E:15:FF:1D:F7:A7:26:21:E9:58:6F:AA: 7C:9A:50:BF:EB:D8:68:1E:3C:4C:C5:9D:84:8A:DC:9C:AF:DD:4F:4C: E7:CE:A9:98:3A:14:7D:38:63:19:C4:A8:ED:10:92:99:3C:44:24:56: 7E:62:DB:97:39:F9:80:DB:F1:53:35:95:8E:DF:DF:63:BA:52:3E:59: 65:FF:60:ED:C5:A3:4C:19:3E:E2:8E:4C:85:9B:E8:5E:70:05:BD:40: 41:81:2C:D4:9E:AC:42:25:B3:07:66:BF:0E:8C:40:BC:E4:6C:9F:2B: B7:66:CA:50:11:7A:E5:95:9D:5E:82:BE:34:F8:92:F6:5D:4E:83:6D: A7:F5:B5:E9:36:97:56:56:08:97:C5:DB:A7:83:4D:01:8B:2E:D1:99: 74:CE:D1:AC:DB:0B:E5:01:0B:86:09:4C:BA:CD:E8:83:00:69:DE:43: A8:4B:C0:DA:9E:94:57:AC:0E:ED:8E:0B:0C:73:3E:F7:BD:01:B0:EB: 05:31:E9:7A:2D:1D:60:9E:AE:AD:AF:09:3F:BA:C9:A8:DF:C6:9C:CD: D1:CB:F7:AC:9D:1B:FC:8E:76:AF:57:D3:10:51:F6:EF:0D:78:34:5C: 4B:5F:30:09:28:12:9D:6F:AF:DD:D9:E7:9D:86:A8:3F:6E:F4:02:C9: E9:6C:9B:C3:36:1E:7D:A6:23:60:09:4D:F9:39:98:40:ED:0C:C9:3E: 93:8E:A6:7B:60:CA:4D:7F:13:94:1C:34:69:5B:23:0B:28:B6:AC:F5: 03:34:21:AB:A5:97:3C:B6:EF:6C:26:85:ED:BA:B6:CD:DD:51:8D:6C: 8E:4B:96:01:57:2E:A6:7A:DB:A0:E7:11 parm: mutex_path:Override ACPI mutex path used to guard access to hardware (charp)

keith@Pipsqueek:~$ sudo modprobe asus-ec-sensors modprobe: ERROR: could not insert 'asus_ec_sensors': No such device

KeithMyers commented 1 year ago

So until the 6.5 kernels are far enough along to build properly and support the nvidia drivers, I'll just skip loading your out of tree module and forget about the water temp that your module show. Gets rid of the concurrent ec access warning too. I have cpu temps and some fan speeds from the nct6775 module for the important bits.

zeule commented 1 year ago

In this case I want to double check there was no BIOS update and that the current BIOS in your hardware matches the ACPI code I have. Could you create ACPI dump again and share it, please? If it is not kernel module that accesses the EC, it has to be the firmware. There are simply no other actors.

KeithMyers commented 1 year ago

There have been no BIOS updates since the emergency update for the too high SoC voltages that was killing cpus and motherboards. Official 1415 BIOS. That was installed over a month ago. Didn't have an EC warnings after installation. The EC warning is recent, only within the last few days. There have been OS firmware updates via apt fairly regularly. I'll provide a new ACPI dump.

KeithMyers commented 1 year ago

Here is the dsdt .dsl output. dsdt.zip

KeithMyers commented 1 year ago

Just a FYI. The asus-ec-sensors module supplied in the latest kernel 6.5-rc2 installs fine and doesn't create the race condition warnings. I removed the nvidia drivers so I could install the rc2 kernel since they won't build in the rc kernels because of a compiler mismatch between the compiler that built the kernel and the compiler in the OS distro. But there are no additional sensors exposed compared to your build. The Phoronix article about the asus-ec-sensors module that is built into the 6.5 kernels made it sound like the module from hwmon branch produced more sensors from my motherboard. Sadly it does not.
So still stuck with the issue of the race warning with VRM and Water temps or no temps from those sensors and/or not able to install the nvidia drivers and avoid the race warning with the builtin module. So reverted back to the 6.4 kernels and working nvidia drivers and no asus-ec-sensors module installed and no VRM and Water temps.

zeule commented 1 year ago

Thank you for all the information! I hope in the comin days to find out what went wrong.

KeithMyers commented 1 year ago

I was wrong. The stock asus_ec_sensors module is automatically loaded by the kernel in the 6.5-rc2 kernel. It is not necessary to modprobe the module.

I am still gettting the race condition. I thought it was because I was double loading the module. But I removed the module from the module.conf file and rebooted. Saw that the module was loaded by default.

You don't get the race condition right away. In the last log it was 12 hours since boot before the race condition shows up in the logs. I blacklisted asus_nb_wmi to see if that was the conflict but it did not help. The asus_wmi module is loaded automatically by the kernel and you can't remove it because it has other processes tied into it that prevents removal.

I have an unstable system that reboots when the race condition count is in the 5 figures. So for now I have blacklisted the asus_ec_sensors module to get back to a stable system.

This really should be brought to the attention of the kernel devs. Not sure how to go about that though. Maybe you have a suggestion.

zeule commented 1 year ago

Now this makes much more sense! We might be using the wrong ACPI mutex. The first change I'd like to ask you to test is to replace the mutex path for your board with the global lock, i.e. ASUS_HW_ACCESS_MUTEX_RMTW_ASMX -> ACPI_GLOBAL_LOCK_PSEUDO_PATH inside the board_info_crosshair_x670e_hero definition. Alternatively you can supply the value of that macro (:GLOBAL_LOCK) via the mutex_path module option.

If that removes the race condition, then we can be sure we race with the board firmware.

KeithMyers commented 1 year ago

OK, I have modded the .c file and substituted the ACPI_GLOBAL_LOCK_PSEUDO_PATH statement for the original in the board_info section. I was unsure about what you meant for the other option.

I have insmodded the module now and will continue to monitor the host and wait until I see the race condition starting again. Still don't know what triggers it hours after the module gets installed.

zeule commented 1 year ago

I was unsure about what you meant for the other option.

The mutex path can be overwritten via the module option, without recompiling. The end effect should be the same. Whether you prefer to modify the source file or a file in /etc/modprobe.d/ is up to you, of course.

KeithMyers commented 1 year ago

Can you explain or show an example of this method via the modprobe.d entry. Are you saying to load the module with an options statement in a asus_ec_sensors.conf file? Like this?

options asus_ec_sensors ?????

Don't see or can find an explanation of the mutex_path module option instructions. OK, I think modinfo gives me a clue.

So this is what you are saying?

sudo modprobe asus_ec_sensors mutex_path:GLOBAL_LOCK

zeule commented 1 year ago

Yes, modinfo tells the option name. So, either # echo "options asus_ec_sensors mutex_path=:GLOBAL_LOCK" >> /etc/modprobe.d/a_file_name.conf or # modprobe asus_ec_sensors mutex_path=:GLOBAL_LOCK

KeithMyers commented 1 year ago

Thanks for the clarification. I'll let the modified module stew for awhile. If it looks like the global lock method prevents the race condition, I will just load the module via a options statement in a modprobe.d conf file.

zeule commented 1 year ago

Please let me know if it works, I can then look carefully into the DSDT source for the proper mutex name.

KeithMyers commented 1 year ago

Well its been over a day now with the global mutex lock on the binary and I haven't had any reoccurrence of the race condition.

. . . . . . or I just haven't triggered whatever I was doing before that caused the race condition. Same applications running and haven't touched the system since I put the modified binary into play.

So I guess the global lock works. You said something about now having to search my dsdt file for the correct ACPI mutex lock to properly implement the fix. Is that the current status as I understand it?

zeule commented 1 year ago

Yes, that's correct.

zeule commented 1 year ago

Just want to let you know I did not forget about this bug. Reading the DSDT code showed that not only they use the global ACPI lock (so you are safe now providing that lock via the module parameter), but also they seems to be doing synchronization on the EC data port in the DSDT code, which should be done by the ACPI implementation inside OS layer. I'd like to understand what's going on before fixing the code.

KeithMyers commented 1 year ago

OK, thanks. I lost my ability to run the latest 6.5 kernels since the Mainline PPA kernels got recompiled with GCC-13 and my Ubuntu 22.04 only has GCC-11 and GCC-12. I get a compiler mismatch trying to install any DKMS module now. So have fallen back to the new 6.2 kernels shipped with the latest Ubuntu 22.04.3 LTS distro via the HWE kernels. I can't install now because I have to use the proprietary Nvidia drivers and they won't compile into the kernel now with any kernel past 6.4.3. But only the 6.5 kernels have the updated nct6775 and asus-ec-sensors modules with support for my motherboard.

I'm just going to have to wait until October for the new 23.10 Ubuntu minor release which is going to use the 6.5 kernels to get a fully supported motherboard again using the standard modules like k10temp, nct6775 and asus-ec-sensors.

Thanks for keeping up on the bug. For now I will just have to use your out of band version and the global lock option.

zeule commented 1 year ago

The fix should be in 6.5.

KeithMyers commented 1 year ago

Do you know if this commit made it into the 6.5-rc7 kernel? Or is it only going to make it into the stable 6.5 kernel next week?

zeule commented 1 year ago

Please track yourself: https://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git/log/?h=hwmon

zeule commented 1 year ago

Hi,

no, I can't. I've made a similar change to the board definition for the model I have and it loads OK. Could you add a debug print and share the output, please:

diff --git a/asus-ec-sensors.c b/asus-ec-sensors.c index 51f9c2db403e..c98e10eb9cc7 100644 --- a/asus-ec-sensors.c +++ b/asus-ec-sensors.c @@ -688,6 +688,8 @@ static int setup_lock_data(struct device *dev) mutex_path = mutex_path_override ? mutex_path_override : state->board_info->mutex_path;

Cheers, Eugene

On Tue, 22 Aug 2023 at 01:47, KeithMyers @.***> wrote:

Using your latest code commit that you mainlined. Can you explain why it does not load?

@.***:~$ sudo dmesg -T | grep asus-ec-sensors [Mon Aug 21 16:41:43 2023] asus-ec-sensors asus-ec-sensors: Failed to get hardware access guard AML mutex 'ACPI_GLOBAL_LOCK_PSEUDO_PATH': error 4097 [Mon Aug 21 16:41:43 2023] asus-ec-sensors asus-ec-sensors: Failed to setup state/EC locking: -2 [Mon Aug 21 16:41:43 2023] asus-ec-sensors: probe of asus-ec-sensors failed with error -2

— Reply to this email directly, view it on GitHub https://github.com/zeule/asus-ec-sensors/issues/43#issuecomment-1687208595, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADAK7NQPRBDIW3YCDMHE7TXWPXS5ANCNFSM6AAAAAA2FIRKYQ . You are receiving this because you modified the open/close state.Message ID: @.***>

KeithMyers commented 1 year ago

You seem to be quoting the post I deleted shortly after I posted it after I realized why I was getting that error. I had forgotten to remove the driver call in the kernel command line. Simply clearing that out and letting the driver load itself normally cleared the error and let the driver module function normally.

KeithMyers commented 11 months ago

Unfortunately, I am back to spamming the logs every 5 seconds with the race condition warnings. Assuming this is because of moving to a 6.6.7 kernel and a new BIOS for the X670E Crosshair Hero motherboards.

What can I do again to provide information about what is the cause?

zeule commented 11 months ago

Is this with the global ACPI lock? If so, I'd like to see the updated DSDT, please.

KeithMyers commented 11 months ago

I am just using the provided module from the 6.6.7 release as is. No modifications on my part.
The module has been been fine the previous kernel releases that have included your updated code. Modinfo shows the normal information about the mutex path override. root@Pipsqueek:/home/keith# modinfo asus-ec-sensors filename: /lib/modules/6.6.7-060607-generic/kernel/drivers/hwmon/asus-ec-sensors.ko.zst license: GPL description: HWMON driver for sensors accessible via ACPI EC in ASUS motherboards author: Eugene Shalygin eugene.shalygin@gmail.com srcversion: F85FCE273D3EE2DBC65EE82 alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGZENITHIIEXTREMEALPHA: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGZENITHIIEXTREME: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXZ690-AGAMINGWIFID4: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXZ390-FGAMING: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXX570-IGAMING: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXX570-FGAMING: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXX570-EGAMINGWIFIII: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXX570-EGAMING: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXB550-IGAMING: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGSTRIXB550-EGAMING: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGCROSSHAIRVIIIIMPACT: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGMAXIMUSXIHERO(WI-FI): alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGMAXIMUSXIHERO: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGCROSSHAIRX670EHERO: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGCROSSHAIRVIIIHERO(WI-FI): alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGCROSSHAIRVIIIHERO: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGCROSSHAIRVIIIFORMULA: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnROGCROSSHAIRVIIIDARKHERO: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnProWSX570-ACE: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnProArtB550-CREATOR: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnProArtX570-CREATORWIFI: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnPRIMEX570-PRO: alias: dmi:rvnASUSTeKCOMPUTERINC.:rnPRIMEX470-PRO*: depends:
retpoline: Y intree: Y name: asus_ec_sensors vermagic: 6.6.7-060607-generic SMP preempt mod_unload modversions sig_id: PKCS#7 signer: Build time autogenerated kernel key sig_key: 4F:15:87:0F:65:42:6A:85:BF:5D:0E:A1:90:81:7A:59:AF:09:69:EE sig_hashalgo: sha512 signature: 37:5E:62:F7:A1:AE:B2:D9:1A:90:DC:D7:7E:07:DF:AF:94:74:03:69: B3:6F:7E:CD:63:46:3A:DC:3E:B6:5B:79:ED:D6:CB:F3:62:45:C1:0F: B8:41:86:71:61:B4:07:14:BD:CB:44:9D:03:71:21:5C:4B:FD:81:DF: AE:71:DE:CB:73:B7:52:5A:9C:B4:79:85:0C:23:5A:EE:BB:D4:88:A6: 47:48:3C:F9:57:FB:6F:91:C5:4E:EC:59:2C:40:79:BC:B7:AE:D0:34: 50:EF:13:84:CE:FB:C1:39:A6:A3:FF:AB:EE:8D:9C:C4:1A:5E:B8:CD: C6:9F:8B:80:20:EA:72:B7:B2:59:5C:94:55:F5:D5:73:02:6A:91:F1: 43:F4:78:E6:5A:C8:38:32:E8:AB:C7:52:0E:64:C2:25:85:77:78:05: B3:19:9B:5C:D1:20:47:03:53:23:1B:02:62:77:53:D1:4E:DE:88:E7: F6:44:24:B6:9E:1B:6A:D5:FC:E6:F3:6A:EE:2C:88:E8:F9:D2:7F:8D: A3:FC:CD:9D:C2:DE:FE:34:82:09:D7:FF:DF:5D:59:8D:4D:B4:B2:5B: 00:30:FA:05:1F:19:5F:D7:6A:91:84:34:B2:AC:AB:8D:66:94:3A:D2: F8:9B:A1:95:79:7A:A9:F0:39:C5:B9:FC:F1:03:3C:C8:26:88:69:2C: 08:E7:32:83:23:84:12:70:73:C7:1D:4F:CA:92:8B:7E:41:75:3E:3E: 16:22:68:DC:C5:74:DD:93:4B:98:71:BA:B4:39:CD:09:80:C4:6B:61: 38:F3:24:94:07:CA:2F:36:0B:1A:69:70:F4:93:E7:B0:81:0E:A3:47: C3:5B:B9:85:1E:39:A9:26:78:22:B2:B0:02:4A:CA:8A:E9:AE:F9:89: C3:EE:97:68:F7:BB:11:D7:6C:22:6E:82:14:41:E9:AA:63:E2:93:F8: D7:D7:0A:8E:1A:74:C1:E8:A0:4E:58:8D:78:C9:47:A1:1C:12:4E:4E: B8:6C:02:55:EA:8E:71:E7:3B:AB:81:C2:D6:10:CC:B8:C7:66:48:4E: DD:8D:F8:0F:F2:32:99:85:17:3F:30:B4:74:4C:5F:5A:97:11:5E:CD: FF:28:20:5E:B6:F9:92:4D:59:70:EC:6B:67:5B:C3:C4:C6:24:BF:23: 23:AC:BE:75:60:40:E0:C0:A5:42:B7:8E:65:54:E1:7B:AD:39:D1:F2: 2C:61:C8:B9:0E:04:54:33:37:3A:26:0C:AC:5C:2B:C4:D2:73:32:2F: DC:2F:EF:72:9C:7B:04:66:CF:57:EA:E2:1A:C6:A3:9A:17:44:8A:45: E3:AE:02:65:1E:15:DE:B5:F1:F5:5B:91 parm: mutex_path:Override ACPI mutex path used to guard access to hardware (charp)

KeithMyers commented 11 months ago

Here is the dsdt.dsl file again. dsdt.zip

KeithMyers commented 11 months ago

Putting in the manually overrided global mutex lock as an option on the module shows this at startup in dmesg. [Sun Dec 17 09:14:55 2023] hwmon hwmon2: Failed to acquire mutex [Sun Dec 17 09:14:55 2023] hwmon hwmon2: update_ec_sensors() failure [Sun Dec 17 09:14:56 2023] hwmon hwmon2: Failed to acquire mutex [Sun Dec 17 09:14:56 2023] hwmon hwmon2: update_ec_sensors() failure [Sun Dec 17 09:15:00 2023] hwmon hwmon2: Concurrent access to the ACPI EC detected. Race condition possible. [Sun Dec 17 09:15:05 2023] hwmon hwmon2: Concurrent access to the ACPI EC detected. Race condition possible. [Sun Dec 17 09:15:10 2023] hwmon hwmon2: Concurrent access to the ACPI EC detected. Race condition possible.

zeule commented 11 months ago

Here is the dsdt.dsl file again. dsdt.zip

Thanks, but could you, please, share the binary? I want to be able to decompile those files with the same iasl version to compare sources.

KeithMyers commented 11 months ago

OK. here is the binary from /usr/lib/modules/6.6.7-060607-generic/kernel/drivers/hwmon

asus-ec-sensors.zip

zeule commented 11 months ago

No, not the module. The DSDT binary file.

KeithMyers commented 11 months ago

Oh . . sorry. Misunderstood the request. dsdt.zip

zeule commented 10 months ago

Thank you! Sorry, I'm slow due to the holidays.

zeule commented 10 months ago

@KeithMyers could you, please, try to pass "\RMTW.ASMX" as the mutex override? In the module parameter, where you used to pass ":GLOBAL_LOCK" before we mainlained that?

KeithMyers commented 10 months ago

Won't accept the command.

modprobe asus_ec_sensors mutex_path=:\RMTW.ASMX

modprobe: ERROR: could not insert 'asus_ec_sensors': No such device

modprobe asus_ec_sensors mutex_path=:RMTW.ASMX

modprobe: ERROR: could not insert 'asus_ec_sensors': No such device

From dmesg log:

asus-ec-sensors asus-ec-sensors: Failed to get hardware access guard AML mutex 'RMTW.ASMX': error 4097 [Tue Dec 26 17:17:50 2023] asus-ec-sensors asus-ec-sensors: Failed to setup state/EC locking: -2 [Tue Dec 26 17:17:50 2023] asus-ec-sensors: probe of asus-ec-sensors failed with error -2

zeule commented 10 months ago

Please try modprobe asus_ec_sensors mutex_path=\\RMTW.ASMX

KeithMyers commented 10 months ago

Module installed with no warnings or errors, but did not fix the race condition still.

[Wed Dec 27 09:57:02 2023] asus-ec-sensors asus-ec-sensors: board has 6 EC sensors that span 6 registers [Wed Dec 27 09:57:14 2023] hwmon hwmon2: Concurrent access to the ACPI EC detected. Race condition possible.

KeithMyers commented 10 months ago

@zeule Did you find anything different in my binary ACPI output compared to the original file that would account for the reappearance of the concurrent race conditions that I am now experiencing with the latest BIOS and kernels?

ZVNexus commented 6 months ago

ROG CROSSHAIR X670E HERO | BIOS 2007 6.9.0-64.fc41.x86_64

Just commenting that I am not seeing the race condition warning reported here.

KeithMyers commented 6 months ago

Seems to be an interaction between Boinc and the use of the ocl-icd-opencl drivers necessary to run the Nvidia drivers with OpenCL support for those projects that only provide OpenCL applications. If I remove that driver I don't get the race condition.