zeule / asus-ec-sensors

Linux HWMON sensors driver for ASUS motherboards to get sensor readings from the embedded controller
GNU General Public License v2.0
45 stars 21 forks source link

asus-ec-sensors does not expose any fan speeds on Asus Crosshair X670E Hero motherboard #42

Closed KeithMyers closed 3 months ago

KeithMyers commented 1 year ago

Installed the latest release and have VRM temps and Water Out temps now. That is very welcome.

I was hoping that the driver would expose the fan header speeds on the board. Sadly it does not. None found.

Out of the seven headers on the board, the nct6775 driver only finds the output of two headers, fan2 and fan7 are the only ones found as identified by sensors on the nct6775 driver output. But fan2 is mislabeled by the driver as the physical Fan2 header is unoccupied. It should be labeled as physical header Fan6.

The fan header Fan4 on the motherboard where I have my pump plugged into is missing for example.

The Bios sensors page finds all the fan header outputs as reference.

zeule commented 1 year ago

Hi,

this driver is not universal in the sense that it can't detect available sensors. They have to be predefined for each board model, which is easy to do when you have an access to the hardware. The readme outlines what to do, and feel free to ask any questions.

For this board you need to use the sensors_family_amd_600 sensor set and possibly extend it (borrowing from sensors_family_amd_500). Usually, all the measurements that can't be controlled by the user (i.e. not PWM fans) are exposed via the EC.

So, you need to add a definition for Crosshair X670 Hero, possibly populate it with all the possible sensors, and then test which ones actually work. To my knowledge, reading non-existent sensors should not damage the hardware, but might lock PWM fans at zero or max mode until power cycled. Adding motherboard definitions includes finding out which ACPI mutex to use for guarding access to the hardware. I can help you with that if you share a dump of the DSDT ACPI table. With this I encourage you to contribute to the project, please.

Good luck!

KeithMyers commented 1 year ago

I can get you the dsdt output. I did that for the asus-wmi-sensors driver dev. But I don't have any access to HWMonitor or any Windows apps. No Windows here.

dsdt.dsl

zeule commented 1 year ago

Thanks, I will find out the mutex name, while you can start with copying definitions for Crosshair X670E Hero. You need to copy over the board_info_crosshair_x670e_hero struct (please name the copy board_info_crosshair_x670_hero ) and the DMI_EXACT_MATCH_ASUS_BOARD_NAME() call (change arguments accordingly). Then $ make, sudo rmmod asus-ec-sensors, and then sudo insmod asus-ec-sensors.ko (use the local file generated by the build). Then check dmesg and sensors output if dmesg does not show the "Unknown board family" error.

If in the end we discover that sensor sets for 670E and 670 are identical, we will merge declarations for those boards.

KeithMyers commented 1 year ago

I'm not following you. What does copying the existing structure in the C file do? How do I know what new sensors to add. Are you saying to copy the sensors structure from the family_amd_500_series? I guess I will need to get the naming structure out of the UEFI interface for the fans since no Windows. Basic ASUS UEFI sensors page structure that hasn't changed in 3 generations. Same for x370 and x470. Never owned a x570 board though.

zeule commented 1 year ago

Sorry. I'm suggesting to start with copying over sensor definition for Crosshair X670E Hero, a very similar board. Just to test you can change board name here: https://github.com/zeule/asus-ec-sensors/blob/master/asus-ec-sensors.c#L491. If that works fine, just duplicate lines 491 and 492 and supply your board name in the copy. If you need to change the sensor set, copy the structure those lines referring to (board_info_crosshair_x670e_hero) and make changes in the copy.

KeithMyers commented 1 year ago

You are confused. I AM using a Crosshair X670E Hero board. That board is covered by your driver. All I was commenting on is the lack of what your driver exposes.

KeithMyers commented 1 year ago

Oh, sorry I did not realize the typo in the name.

zeule commented 1 year ago

This driver does not know how to get the sensor/fan to EC register map out of the UEFI code, so, yes, you have to find it out by trial and error. I'd indeed suggest to take the amd 500 family as an example, and look around the Motherboard and VRM sensors, already discovered for the AMD 600 family.

zeule commented 1 year ago

Alternatively, you can use the ec_sys kernel module, get a dump of the first bank of the EC registers, and look for numbers, similar to what you are looking for.

KeithMyers commented 1 year ago

I have a teammate using your sensor driver on his Asus Crosshair VIII Hero board which use the X570 chipset. Would his sensors output be useful in copying his exposed fan naming structure into the family_600 structure?

KeithMyers commented 1 year ago

Alternatively, you can use the ec_sys kernel module, get a dump of the first bank of the EC registers, and look for numbers, similar to what you are looking for.

How would I do this part. How to to get a dump from the EC registers? I see the module available. After loading it, how to get a dump?

zeule commented 1 year ago

Sorry, I again can't follow. What's the intent of copying sensors output? If you want to know which sensors might be present, you have them in the UEFI UI for this board. Those are all non-controllable by user (temperatures, voltages, fans without PWM).

ec_sys (https://github.com/torvalds/linux/blob/master/drivers/acpi/ec_sys.c) is just another kernel mode. If you build your kernel yourselves, you can enable it. Otherwise its presence depends on the distribution, I guess. to make a dump, modprobe the module and then issue # hexdump /sys/kernel/debug/ec/ec0/io (you might want to use hexdump -C).

KeithMyers commented 1 year ago

I have the ec_sys module in my kernel. I can load it. I just did not know how to get anything out of it.

KeithMyers commented 1 year ago
# hexdump -C /sys/kernel/debug/ec/ec0/io
00000000  d8 26 1f 1c 28 07 27 00  00 00 00 00 00 00 00 00  |.&..(.'.........|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000030  00 00 00 00 00 00 00 00  02 10 00 00 03 18 14 00  |................|
00000040  02 48 02 58 01 0f 0f ff  00 00 04 7d ff 00 00 00  |.H.X.......}....|
00000050  00 00 00 00 00 00 00 00  02 a0 25 38 04 e8 00 00  |..........%8....|
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000070  00 0a 03 00 00 dc dc 00  00 30 00 00 00 00 00 00  |.........0......|
00000080  58 00 00 53 00 00 00 00  00 00 00 00 00 00 00 00  |X..S............|
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000c0  00 00 00 00 00 00 fd e8  fd e8 00 00 00 00 00 00  |................|
000000d0  fd e8 fd e8 fd e8 00 00  00 00 00 02 00 00 00 00  |................|
000000e0  00 00 00 00 03 00 00 00  00 00 00 00 00 00 00 00  |................|
000000f0  0f 00 00 00 00 00 00 00  ec 00 00 00 00 00 00 01  |................|
00000100
zeule commented 1 year ago

I suspect you somehow got a dump of the second EC bank. Do you have the "Water Out" sensor connected and "Water In" not?

KeithMyers commented 1 year ago

Yes I have a Water Out sensor connected and displaying the correct temp. No Water In sensor connected.

zeule commented 1 year ago

Those are here: "00000000 d8 26". "d8" is a blank value for temperature sensors, here it is at the "Water In" register (second bank, register 0x0, "26" is Water Out (38 in decimal, that is 38 degrees Celsius). Now, some software (can be this very driver) switched the EC to the second bank. Usually it happens only temporarily and the current bank is reset to 0 (the first bank). Thus, could you, please, try to take another snapshot using the ec_sys?

KeithMyers commented 1 year ago

I re-ran the command and got a different dump output.

# hexdump -C /sys/kernel/debug/ec/ec0/io
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000030  54 5f 2c 37 00 00 c2 00  00 00 00 00 00 00 00 00  |T_,7............|
00000040  00 4f 54 28 00 0f 0f ff  00 00 04 7d ff 00 00 00  |.OT(.......}....|
00000050  00 00 00 00 00 ff ff 00  00 00 05 00 00 00 00 00  |................|
00000060  00 00 00 00 00 00 00 00  00 00 d8 96 00 00 00 00  |................|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000090  5c 67 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |\g..............|
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000000c0  00 01 01 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000000d0  40 00 10 40 00 10 04 00  00 00 00 00 00 00 00 02  |@..@............|
000000e0  00 00 00 00 00 00 20 9c  4d 0e b1 f9 0d 0f fb 76  |...... .M......v|
000000f0  a5 0a 3a f3 a8 00 00 00  00 00 00 00 00 00 00 00  |..:.............|
00000100```
``
zeule commented 1 year ago

Now this looks like the first bank (look, there is 2c at register 0x32, which should be motherboard temperature and 2c is 44⁰C).

So, there might be an unconnected temperature sensors at 0x6a (showing the blank value now), a fan at 0x4a (at 1149 RPMs), there might be something at 0x41-0x43 range, and so on.

zeule commented 1 year ago

LibreHardwareMonitor users discovered a CPU Optional Fan (two bytes) sensor for this generation at 0xb0, but the whole 0xb line in your dump is empty.

KeithMyers commented 1 year ago

Haha LOL. Well it might all be obvious to you, but to me it is all gobblygook. I don't know how to read the registers and know what they represent. I gather I need to figure all this out myself in order to input the new sensor names at their respective addresses in the sensors structure.

KeithMyers commented 1 year ago

I'm only using the CPU_FAN, AIO_PUMP and FAN4 headers on the board currently. Along with the WATER OUT sensor. FAN4 is connected to the pump speed output of the D5 pump. That is not showing in the driver but does show in UEFI. There is no fan running at 1149 rpms. All the fans are running at ~2000 rpms. They are on the radiator and one running at full speed in the rear. All the fan outputs are set to full speed in the UEFI. The D5 pump runs at ~4800 rpm.

zeule commented 1 year ago

Here are the rules:

  1. Everything is in the hexadecimal system (almost any calculator, offline or online, can convert between hexadecimal and decimal).
  2. Each EC register contains one byte.
  3. Temperature and current sensors span 1 register, fans and voltages span two.
  4. Temperature is in degree Celsius, fans are in RPMs, current in Amperes, voltage is in millivolts.
  5. For all sensors we've seen so far, the byte order for multi-byte sensors is "most significant byte first" (MSB), ie if you see "04 7d" in the dump, it is number 0x47d (1149 decimal).
zeule commented 1 year ago

I'm only using the CPU_FAN, AIO_PUMP and FAN4 headers on the board currently. Along with the WATER OUT sensor.

But all of them are PWM controlled, aren't they? If so, their readings should be in the Super I/O chip, not the EC controller.

KeithMyers commented 1 year ago

I can convert hex to decimal. But I have to know what the rows and columns in the output represent. This is the mysterious part.

No, none of the fans are PWM. 3 pin voltage control only.

KeithMyers commented 1 year ago

I don't use any PWM components. Only full speed for every component as all my hosts run flat out 24/7 crunching science. Temps have be limited to the environment.

zeule commented 1 year ago

But I have to know what the rows and columns in the output represent.

The row is address divided by 16 with '0' appended, the column is the address mod 16, the gap is between 7 and 8. So, to get the address just combine together those two:

00000040  00 4f 54 28 00 0f 0f ff  00 00 04 7d ff 00 00 00 

"04" is located in the 0x040 row and column 10, which is 'a' in hex. Thus it is inside the register with address 0x4a.

I don't use any PWM components. Only full speed…

But in principle those RPMs are controllable, right? That's what I meant by "PWM controlled". If there is a way to control a fan, it is not exposed via EC.

KeithMyers commented 1 year ago

No, a 3 pin fan is not PWM controllable. Ground,+12V power and sense wire. The only way to control speed is to reduce the voltage supplied to the fan. +12V is full rated speed of the fan. +7V is 7/12 of full rated speed.

zeule commented 1 year ago

If the header is (in principle) controllable via the UEFI UI, it will not be exposed via EC, no matter which fan is connected.

KeithMyers commented 1 year ago

In the UEFI, to run full-speed you either have to disable fan control or in this Zen 4 UEFI, the setting is just 'Full-Speed'

KeithMyers commented 1 year ago

So you are saying that no fans are ever exposed by EC? So how does asus-ec-sensors expose all the fans on my teammates Crosshair VIII Hero motherboard?

zeule commented 1 year ago

OK, so there is a fan control for this header. Hence I'm pretty sure fan RPM reading is not exposed via EC.

So you are saying that no fans are ever exposed by EC? So how does asus-ec-senors expose all the fans on my teammates Crosshair VIII Hero motherboard?

No, only those which are controllable. For example (I have the CH VIII myself), the chipset fan is not controllable and is exposed via EC.

KeithMyers commented 1 year ago

I'm still very confused. What does the EC expose or does not expose? Only those controllable?

zeule commented 1 year ago

Uncontrollable. Mainly temperatures.

KeithMyers commented 1 year ago

I assume your driver bearing ec in its name means it is reading the EC registers. Does it also read the WMI registers to get the readings that are controllable?

zeule commented 1 year ago

Here is the list of all sensors discovered so far: https://github.com/zeule/asus-ec-sensors/blob/master/asus-ec-sensors.c#L99

KeithMyers commented 1 year ago

OK, that matches the sensors output of my friend's board he has published for me. I lamented that I wished the driver would provide the same for me with my Crosshair X670E Hero board that I have my 7950X in. So all the known sensors are in the asus-ec-sensors.c file as listed at line #99. So the issue is that we don't know the matching register addresses for this board for all the known sensors.

zeule commented 1 year ago

I assume your driver bearing ec in its name means it is reading the EC registers.

That's correct.

Does it also read the WMI registers to get the readings that are controllable?

There is no such a thing as "WMI register", at least not in ASUS boards. At some point ASUS exposed the whole hardware monitoring system via the WMI, but is not the case anymore. There are WMI calls to read from the Super I/O chip register and EC registers, but they are extremely slow. The previous version of this driver used that method and the speed was around 10 bytes per second (reading the full set on already mentioned C8H took 3 seconds). I don't do that anymore.

So the issue is that we don't know the matching register addresses for this board for all the known sensors.

That's not entirely correct: not all board models have the whole sensor set.

KeithMyers commented 1 year ago

So why hasn't anybody just copied lines 204-237 into the static const struct at line 240 and just see what pops up?

KeithMyers commented 1 year ago

I still use the asus-wmi-sensors driver for all my Crosshair VII Hero boards. Works great, never any issue.

zeule commented 1 year ago

So why hasn't anybody just copied lines 204-237 into the static const struct at line 240 and just see what pops up?

How should I know?

I still use the asus-wmi-sensors driver for all my Crosshair VII Hero boards. Works great, never any issue.

Yes, that's the generation with the whole hardware monitoring system exposed via WMI, and all the values read in a single WMI call.

KeithMyers commented 1 year ago

So that leaves it to me to be the guinea pig and try this I guess. The only ones I really would want are the fan speeds for CPU_FAN, AIO_PUMP and FAN4.

zeule commented 1 year ago

All three are user-controllable, likely not exposed via EC. Is there a driver for the Super I/O chip for this board?

KeithMyers commented 1 year ago

Yes, the stock nct6775 driver in > 6.3 kernels pick up the SIO.

nct6796-isa-0290
Adapter: ISA adapter
Vcore:                      1.24 V  (min =  +0.00 V, max =  +2.06 V)
in1:                      960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
AVCC:                       3.41 V  (min =  +2.98 V, max =  +3.63 V)
+3.3V:                      3.31 V  (min =  +2.98 V, max =  +3.63 V)
in4:                      1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                        1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                        1.12 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
3VSB:                       3.41 V  (min =  +2.98 V, max =  +3.63 V)
Vbat:                       3.33 V  (min =  +2.70 V, max =  +3.63 V)
in9:                        1.66 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                     528.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                     528.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                       1.20 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                       1.23 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                        0 RPM  (min =    0 RPM)
fan2:                     2017 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan6:                        0 RPM  (min =    0 RPM)
fan7:                     1939 RPM  (min =    0 RPM)
SYSTIN:                    +45.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +53.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +19.0°C    sensor = thermistor
AUXTIN1:                   +22.0°C    sensor = thermistor
AUXTIN2:                   +22.0°C    sensor = thermistor
AUXTIN3:                   +14.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +84.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
PCH_CPU_TEMP:               +0.0°C  
TSI0_TEMP:                 +95.5°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled

Much of which is gobbly-gook without a scaling conf file. I fudged something that sort of works for the Vcore. Not even the correct formula., but close.

The two fans exposed are the correct rpms but not identified correctly. Other fan headers are missing entirely. Fan2 should be Fan6 logically based on location. Board identifier Fan2 is unpopulated. Fan4 and Fan5 missing.

The stock k10temp picks up the cpu temps from sysfs.

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +95.5°C  
Tccd1:        +95.2°C  
Tccd2:        +83.2°C 

This is what asus-ec-sensors shows:

asusec-isa-0000
Adapter: ISA adapter
CPU:          +84.0°C  
CPU Package:  +95.0°C  
Motherboard:  +45.0°C  
VRM:          +56.0°C  
Water_In:     -40.0°C  
Water_Out:    +39.0°C  
zeule commented 1 year ago

It is interesting the CPU in EC corresponds to the second CCD temperature, which is the lowest one. What is fan7 in the nct6775 output?

KeithMyers commented 1 year ago

Fan7 is CPU_Fan.

Fan2 is AIO_PUMP

Both are located at the top of the board next to each other. CPU_OPT is also located next to them and is unoccupied. Based on the board diagram in the manual, the NCT6775 drivers has them identified incorrectly.

Fan_1, Fan_2 and Fan_3 are next to each other on the bottom of the board. Fan_4 is on the front, middle of the board. AIO_PUMP, CPU_FAN and CPU_OPT at the top of the board.

Logically, counting counter-clockwise from the bottom left of the board, the headers should be identifed:

Fan_1, Fan_2, Fan_3, Fan_4, Fan_5, Fan_6 and Fan_7 or in the board naming vernacular,

Fan_1, Fan_2, Fan_3, Fan_4, AIO_PUMP, CPU_FAN and CPU_OPT

FAN_2 should be identified as FAN5 and FAN_7 should be identifed as Fan6 logically.

Fan4 has my D5 pump on it and is missing entirely.

UnAfraid commented 1 year ago

Hi,

I also have the same motherboard Here is my dsdt.dsl Bios version: 1602 Here is screenshot from HWInfo image

Let me know if there is something i can provide

zeule commented 1 year ago

Hi, @UnAfraid,

and thank you for the willing to help! Well, we are missing many reading that HWInfo is aware of. If you want to spend an hour or two locating registers with those, I and other users, I guess, will appreciate that. There a a few tools you can use. If you familiar with C#, you can grab a copy of Libre Hardware Monitor sources, where I duplicated this driver, and start tinkering. Or any of the tools that print out EC registers and a hex calculator will help to locate sensors. LibreHardwareMonitor itself can dump EC registers (included in the hardware report).

My suggestion is to begin with locating registers those values are changing with time, and then try them out in LHM, monitoring reading and comparing them to ASUS software or HWInfo.

UnAfraid commented 1 year ago

Hi @zeule,

Yes i am familiar with C#, i'll have a look.

Thanks for the info.

UnAfraid commented 1 year ago

So far managed to find "VRM" Temperature its on offset 0x33 image

Edit: I notice you already have that one mapped. For me CPU, CPU Package and Motherboard values are always 0, and HWInfo doesn't have them.

Embedded Controller Registers

      00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

 00   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 10   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 20   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 30   00 00 00 2D 00 00 00 00 00 00 00 00 00 00 00 00
 40   00 4F 00 28 00 0F 0F FF 00 00 04 7D FF 00 00 00
 50   00 00 00 00 00 FF FF 00 00 00 05 00 00 00 00 00
 60   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 70   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 80   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 90   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 A0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 B0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 C0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 D0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 E0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 F0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04
Embedded Controller Registers

      00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

 00   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 10   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 20   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 30   00 00 00 2C 00 00 00 00 00 00 00 00 00 00 00 00
 40   00 4F 00 28 00 0F 0F FF 00 00 04 7D FF 00 00 00
 50   00 00 00 00 00 FF FF 00 00 00 05 00 00 00 00 00
 60   00 00 00 00 00 00 00 00 00 00 D8 96 00 00 00 00
 70   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 80   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 90   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 A0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 B0   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 C0   00 01 01 00 00 00 00 00 00 00 00 00 00 00 00 00
 D0   40 00 10 00 04 10 04 00 00 00 00 00 00 00 00 02
 E0   00 00 00 00 00 00 11 87 A4 07 BB 2B 66 02 22 AE
 F0   11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00