linux-surface / surface-aggregator-module

Linux ACPI and Platform Drivers for Surface Devices using the Surface Aggregator Module over Surface Serial Hub (Surface Book 2, Surface Pro 2017, Surface Laptop, and Newer)
GNU General Public License v2.0
93 stars 11 forks source link

Support for new event registration subsystem. #40

Closed qzed closed 3 years ago

qzed commented 4 years ago

Newer devices (SP7, SL3, SB3, ..?) seem to use a different mechanism to enable events.

Copied from https://github.com/linux-surface/linux-surface/issues/201#issuecomment-640298241:

So we'll need to add support for the REG system and enable/disable the events via that. The kbd- commands not doing anything indicate that the old way to enable/disable events isn't being used here.

I've also had a look at the Windows driver, and it seems like the old and new event enable/disable commands have, as far as I can tell (unfortunately I'm not a 100% sure here as the CID/TC values seem to be supplied by the caller and not the registry function), mostly the same code path. So that would indicate that the old and new ways don't play together, but the new is a replacement for the old.

Another thing that I've noticed is the battery. I accidentally added them via IID (instance-ID) 0 and 1, but on the SB2 those were 1 and 2 (this should pretty much have the effect that there are two battery instances created, but they will display the exact same data, i.e. the data of battery 1, which should be the one integrated into the clipboard). I've fixed that now, however, after that I had another look at the battery messages in the logs, and it looks like this is not going to work: It seems like they separated the batteries not by IID but by priority (which I think we should rename at this point, because that use kind of indicates that it's not a priority but maybe rather something like a transfer channel). That is kind of confirmed by one of the REG events: 01 00 02 02 01 (the 02 01 at the end is likely TC=2, i.e. battery/power subsystem and IID=1). So likely, the current branch state will break the battery driver and the old state will show only battery 1, but two times... I'll try to fix that tomorrow.

Just to make sure that they haven't done some controller magic to virtually "merge" the two batteries into one for display purposes: The (extended) battery status on Windows does show two batteries?

So with regards to this, I want to try and find the answers for a couple of questions:

  • Is the REG system set-up per channel? In the detach/attach IRPMon log, the reg system is only accessed via channel 2, event messages for input devices are sent via channel 2, but event messages for the battery are sent via channel 1 and 2. So to properly enable events, do we have to set up the battery events for channel 1 by accessing the REG system via channel 1?

  • What are the event messages returned when enabling events for the REG system? Could be currently enabled events or could be events that can be enabled. Part of the payload could indicate if an event has been enabled or not. Or maybe even how many times if it does internal counting. The last two bytes seem to be TC and IID. The byte before those could be the channel. Some example payloads of the event messages:

      01 00 02 03 01
    
      01 00 02 02 01
    
      01 00 02 15 01
      01 00 02 15 03
      01 00 02 15 05
      01 00 02 15 06
    
      01 00 02 21 00
  • Related to previous: Can events be enabled multiple times (i.e. does it do resource-counting internally or is it just on/off)?

  • Related to previous: Is there a command to get the currently enabled events (or trigger the event messages sent directly after enabling events for REG)?

  • When was this introduced? Looks like the SL3 and Pro7 use this too. We might also check SL1/SL2 and Pro6, but I don't think they use that yet.

  • What about the thermal subsystem (TC=3)? Looks like there is only an event with IID=1 enabled. On the SB2, temperature sensors were separated by IID. Does this still apply or are they separated by channel now? Also those probably require a new driver since they don't seem to accessed by ACPI any more...

qzed commented 4 years ago

There also seem to be other subsystems that can act as event registries, in total:

The choice of subsystem seems to depend on the device in question (e.g. there are multiple keyboard devices that seem to use different systems, see https://github.com/linux-surface/surface-aggregator-module/issues/42#issuecomment-642880467). It could be that specific systems have specific purposes. REG seems to be an extension of the SAM event registration, which now also allows enabling events by instance ID.

The payload across those subsystems is the same, at least according to the Windows driver:

struct event_registration_payload {
    uint8_t  target_category;
    uint8_t  seq;
    uint16_t request_id;
    uint8_t  instance_id;
};

All registration commands seem to have a response of length 1, indicating the status of the operation (KIP has not been observed yet, but also has response of at least size one according to the Windows driver).

Note: This is a difference to the older (SB2) driver, where TC=0x01 has a payload of length 4, which represents the payload above without instance_id. Both payload types seem to be accepted on all devices for TC=0x01, including the longer payload on the SB2.

qzed commented 4 years ago

Some insights from the Windows SurfaceSerialHub driver:

qzed commented 4 years ago

@archseer, @steveniemitz, @NickyTope I've got some more stuff to test for you guys. I've broken this down into a couple of blocks. Would be great if you've compiled the module with debug output enabled and log the interaction (best if each block has it's own logfile), so that we can see any side-effects. The script is on the feature/sb3-v2 branch.

For anyone new here, you can build and test the module (with debug output) as follows: Clone the feature/sb3-v2 branch and comment-in this flag. After that, follow https://github.com/linux-surface/surface-aggregator-module/tree/feature/sb3-v2#buildtest-the-modules. If you've changed the kernel since the last build/test, you might need to run make clean all to re-build the module for the current kernel.

Now for the stuff that I think should be tested:

Some explanations if anyone is interested:

First, the basic usage of the script:

Usage:
  ./evreg.py <command> [args...]

Commands:
  help
    display this help message

  simple <tc> <cid> <iid> <channel> <snc>
    simple command without payload

  enable <channel> <ev_tc> <ev_seq> <ev_iid>
    enable event using REG subsystem

  disable <channel> <ev_tc> <ev_seq> <ev_iid>
    disable event using REG subsystem

  legacy-enable <channel> <ev_tc> <ev_seq> <ev_iid>
    enable event via legacy method

  legacy-disable <channel> <ev_tc> <ev_seq> <ev_iid>
    disable event via legacy method

Arguments:
  <tc>:       command target category
  <cid>:      command ID
  <iid>:      command instance ID
  <channel>:  communication channel
  <snc>:      command-expects-response flag
  <ev_tc>:    event target category
  <ev_seq>:   event-is-sequenced flag
  <ev_iid>:   event instance ID

The enable and disable commands enable/disable events using the REG subsystem. The legacy-<...> comands enable/disable events using the SAM subsystem, used on previous devices for that purpose.

The simple command sends a simple command to SAM without any payload.

I should also write a (very) quick overview over SAM: SAM is the Surface Aggregator Module, basically the EC of the Surface devices. It can receive commands and send events. Commands and events are identified by a target category (TC, groups broader functionality, e.g. TC=0x15 is the HID subsystem used for the keyboard/touchpad on the SB3/SL3) and a command ID. Additionally there is a communication channel (channel/chn) and an instance ID (IID). The channel seems to separate communication similar to the IID, but it's exact purpose is still a bit unclear to me. The IID can identify multiple devices (e.g. thermal sensors), so that the same command can be executed on different instances (e.g. get temperature for sensor IID=x, where x selects the sensor).

For event registration, there is also (what we assume to be) the SEQ flag, which indicates if events are sequenced or unsequenced, which determines some communication details (basically, think of sequenced packets as something similar to TCP, i.e. they need to be acknowledged, where as unsequenced packets are like UDP).

steveniemitz commented 4 years ago

phew, what a write up! thanks for all the details on this, its really helpful for understanding!

Is the REG system set-up per channel?

The command for channel 1 failed for me (IO Error), but channel 2 worked. Attached here as channel2.log. I added some annotations into the log after disabling events, unplugging AC ("after event") and plugging it back in ("after event2").

channel2.log

More to come!

steveniemitz commented 4 years ago

Is there a command to get the currently enabled events (or trigger the event messages sent directly after enabling events for REG)?

The command for CID 3 returns something when I run it. Do you want the kernel debug logs there too?

# ./scripts/evreg.py simple 0x21 0x03 0x00 0x02 0x01
a4 06 59 00 00

A snippet from the kernel logs:

Jun 11 21:45:52 steve kernel: [ 1392.622678] tx: 00000000: aa 55 40 00 00 8f 3b 8a ff ff
Jun 11 21:45:52 steve kernel: [ 1392.622685] surface_sam_ssh serial0-0: event: unhandled event (rqid: 0x0021, tc: 0x21, cid: 0x03)
Jun 11 21:45:52 steve kernel: [ 1392.624809] surface_sam_ssh serial0-0: rx: received data (size: 23)
Jun 11 21:45:52 steve kernel: [ 1392.624810] rx: 00000000: aa 55 80 0d 00 90 10 98 80 21 00 02 00 21 00 03
Jun 11 21:45:52 steve kernel: [ 1392.624810] rx: 00000010: 01 00 02 21 00 4e 2d
Jun 11 21:45:52 steve kernel: [ 1392.624811] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x80, len: 13)
Jun 11 21:45:52 steve kernel: [ 1392.624812] surface_sam_ssh serial0-0: rx: parser: valid command found (tc: 0x21, cid: 0x03)
Jun 11 21:45:52 steve kernel: [ 1392.624814] surface_sam_ssh serial0-0: tx: sending data (length: 10)
Jun 11 21:45:52 steve kernel: [ 1392.624815] tx: 00000000: aa 55 40 00 00 90 e5 69 ff ff
qzed commented 4 years ago

Thanks!

Do you want the kernel debug logs there too?

From what I can see in the snippet, the command works as expected, although I have no clue what its direct returned data is. The payload is a bit unexpected (01 00 02 21 00) as I'd expect the last byte to be 01, 03 or 05. A full log would be great.

steveniemitz commented 4 years ago

Sure thing. I'm on an external mouse/keyboard now, which should cut down on the log noise a lot too. In fact, lemme run the first test again now too.

Jun 12 10:26:14 steve-work-sb kernel: [ 1959.076428] surface_sam_ssh serial0-0: tx: sending data (length: 18)
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.076431] tx: 00000000: aa 55 80 08 00 3c 86 07 80 21 02 00 00 5e 00 03
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.076431] tx: 00000010: c4 e4
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.077723] surface_sam_ssh serial0-0: rx: received data (size: 10)
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.077725] rx: 00000000: aa 55 40 00 00 3c 83 1d ff ff
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.077726] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x40, len: 0)
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.077727] surface_sam_ssh serial0-0: rx: valid control message received (type: 0x40)
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.079123] surface_sam_ssh serial0-0: rx: received data (size: 23)
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.079126] rx: 00000000: aa 55 80 0d 00 4d c0 82 80 21 00 02 00 5e 00 03
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.079126] rx: 00000010: c4 e4 49 00 00 ff c2
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.079127] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x80, len: 13)
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.079128] surface_sam_ssh serial0-0: rx: parser: valid command found (tc: 0x21, cid: 0x03)
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.079129] surface_sam_ssh serial0-0: rx: valid command message received
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.079185] surface_sam_ssh serial0-0: tx: sending data (length: 10)
Jun 12 10:26:14 steve-work-sb kernel: [ 1959.079186] tx: 00000000: aa 55 40 00 00 4d 35 73 ff ff
qzed commented 4 years ago

Awesome thanks!

steveniemitz commented 4 years ago

Here's the first test again, much less noisy now on external mouse/keyboard.

channel2_again.log

steveniemitz commented 4 years ago

Repeating that command should show those again.

Logs: repeat-enable.log

steveniemitz commented 4 years ago
# trigger REG events, test that keyboard and touchpad are is enabled
./scripts/evreg.py disable 0x02 0x15 0x00 0x03
# trigger REG events, test that touchpad is disabled and keyboard enabled
CONFIRMED!
./scripts/evreg.py disable 0x02 0x15 0x00 0x01
# trigger REG events, test that both touchpad and keyboard are disabled
CONFIRMED!
./scripts/evreg.py enable 0x02 0x15 0x00 0x01
# trigger REG events, test that touchpad is disabled and keyboard is enabled
CONFIRMED!
./scripts/evreg.py disable 0x02 0x15 0x00 0x03
# trigger REG events, test that both touchpad and keyboard are enabled
CONFIRMED
steveniemitz commented 4 years ago

Based on the previous two points: Can events be enabled multiple times? I.e. is there some sort of reference counting?

It seems like there's no ref counting here, eg, if I enable the touchpad 3 times, and disable it once, it becomes disabled. vice versa as well.

qzed commented 4 years ago

The command for channel 1 failed for me (IO Error), but channel 2 worked. Attached here as channel2.log. I added some annotations into the log after disabling events, unplugging AC ("after event") and plugging it back in ("after event2").

Okay, so it doesn't look like it's per-channel then. Two interesting points here:

Regardless, we should also try to figure out which calls enable/disable battery events (at least for channel 1 events, which we've seen so far), since

./scripts/evreg.py disable 0x02 0x02 0x01 0x01

doesn't seem to do that.

The following commands should ensure that there should absolutely be no event sent (that's basically all disable commands for the battery that I think could make some sense, also no need to test after each).

./scripts/evreg.py legacy-disable 0x01 0x02 0x00 0x00
./scripts/evreg.py legacy-disable 0x01 0x02 0x01 0x00
./scripts/evreg.py legacy-disable 0x01 0x02 0x02 0x00

./scripts/evreg.py disable 0x02 0x02 0x00 0x02
./scripts/evreg.py disable 0x02 0x02 0x01 0x02
./scripts/evreg.py disable 0x02 0x02 0x02 0x02

./scripts/evreg.py disable 0x02 0x02 0x00 0x01
./scripts/evreg.py disable 0x02 0x02 0x01 0x01
./scripts/evreg.py disable 0x02 0x02 0x02 0x01

./scripts/evreg.py disable 0x02 0x02 0x00 0x00
./scripts/evreg.py disable 0x02 0x02 0x01 0x00
./scripts/evreg.py disable 0x02 0x02 0x02 0x00

After that, make sure that there are no power events any more (should be grep-able for power event, CID=0x17 is charger status change). If that worked, we can try finding the right one to enable the battery events. This should follow the schema above. I think it should be one of

./scripts/evreg.py enable 0x02 0x02 <seq> <iid>

Again, I think <seq> here could also mean the channel the events are sent through. The IID seems to be 1 for everything, so start with <seq> = 2 and <iid> = 1, then <seq> = 1, i.e.

# trigger events (no events should be here)
./scripts/evreg.py enable 0x02 0x02 0x02 0x01
# trigger events (hopefully events for channel 2 should be here)
./scripts/evreg.py enable 0x02 0x02 0x01 0x01
# trigger events (hopefully events for channel 1 and 2 should be here)

If that doesn't work, we should the same with IID 0. (i.e. change the last parameter to 0x00).

If that still doesn't work, try

# trigger events (no events should be here)
./scripts/evreg.py legacy-enable 0x02 0x02 0x02 0x00
# trigger events (hopefully events for channel 2 should be here)
./scripts/evreg.py legacy-enable 0x02 0x02 0x01 0x01
# trigger events (hopefully events for channel 1 and 2 should be here)
qzed commented 4 years ago

Based on the previous two points: Can events be enabled multiple times? I.e. is there some sort of reference counting?

It seems like there's no ref counting here, eg, if I enable the touchpad 3 times, and disable it once, it becomes disabled. vice versa as well.

Neat! Although that was kind of expected as the Windows driver does reference counting internally.

# trigger REG events, test that keyboard and touchpad are is enabled
./scripts/evreg.py disable 0x02 0x15 0x00 0x03
# trigger REG events, test that touchpad is disabled and keyboard enabled
CONFIRMED!
./scripts/evreg.py disable 0x02 0x15 0x00 0x01
# trigger REG events, test that both touchpad and keyboard are disabled
CONFIRMED!
./scripts/evreg.py enable 0x02 0x15 0x00 0x01
# trigger REG events, test that touchpad is disabled and keyboard is enabled
CONFIRMED!
./scripts/evreg.py disable 0x02 0x15 0x00 0x03
# trigger REG events, test that both touchpad and keyboard are enabled
CONFIRMED

Do you have a log for that? I'd love to see the REG event outputs, so that we can figure out what they mean (i.e. if they indicate what events have been enabled).

qzed commented 4 years ago

I have to do some other stuff today, so I'll have to go now, but I'll have a look at the logs in more detail tomorrow.

steveniemitz commented 4 years ago

After that, make sure that there are no power events any more

Confirmed

./scripts/evreg.py legacy-enable 0x01 0x02 0x00 0x00

Brought back power events (plug/unplug at least). That was the first I tried that did anything (the non-legacy enables didn't).

Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789893] surface_sam_ssh serial0-0: rx: received data (size: 57)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789898] rx: 00000000: aa 55 00 08 00 68 cf c0 80 02 00 01 01 02 00 17
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789899] rx: 00000010: 0f 7c aa 55 00 1d 00 69 7d 78 80 02 00 01 01 02
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789901] rx: 00000020: 00 53 01 84 02 01 00 92 9a 00 00 ac 00 10 20 92
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789902] rx: 00000030: e5 00 00 e4 25 00 00 62 b5
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789905] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x00, len: 8)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789907] surface_sam_ssh serial0-0: rx: parser: valid command found (tc: 0x02, cid: 0x17)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789916] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x00, len: 29)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789918] surface_sam_ssh serial0-0: rx: parser: valid command found (tc: 0x02, cid: 0x53)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789928] surface_sam_sid_ac surface_sam_sid_ac: power event (cid = 0x17)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789931] surface_sam_sid_battery surface_sam_sid_battery.1: power event (cid = 0x17)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789934] surface_sam_sid_battery surface_sam_sid_battery.2: power event (cid = 0x17)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789937] surface_sam_sid_ac surface_sam_sid_ac: power event (cid = 0x53)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789939] surface_sam_sid_battery surface_sam_sid_battery.1: power event (cid = 0x53)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789941] surface_sam_sid_battery surface_sam_sid_battery.1: unhandled power event (cid = 0x53)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789942] surface_sam_sid_battery surface_sam_sid_battery.2: power event (cid = 0x53)
steveniemitz commented 4 years ago

Do you have a log for that? I'd love to see the REG event outputs, so that we can figure out what they mean (i.e. if they indicate what events have been enabled).

disable 0x03:

Jun 12 10:55:11 steve-work-sb kernel: [ 3696.320211] surface_sam_ssh serial0-0: tx: sending data (length: 23)
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.320213] tx: 00000000: aa 55 80 0d 00 c1 c4 d2 80 21 02 00 00 e3 00 02
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.320213] tx: 00000010: 15 00 15 00 03 3b dc
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.321594] surface_sam_ssh serial0-0: rx: received data (size: 10)
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.321596] rx: 00000000: aa 55 40 00 00 c1 31 23 ff ff
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.321597] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x40, len: 0)
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.321598] surface_sam_ssh serial0-0: rx: valid control message received (type: 0x40)
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.323043] surface_sam_ssh serial0-0: rx: received data (size: 19)
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.323045] rx: 00000000: aa 55 80 09 00 74 7a f9 80 21 00 02 00 e3 00 02
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.323046] rx: 00000010: 00 6f 69
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.323048] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x80, len: 9)
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.323049] surface_sam_ssh serial0-0: rx: parser: valid command found (tc: 0x21, cid: 0x02)
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.323050] surface_sam_ssh serial0-0: rx: valid command message received
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.323105] surface_sam_ssh serial0-0: tx: sending data (length: 10)
Jun 12 10:55:11 steve-work-sb kernel: [ 3696.323107] tx: 00000000: aa 55 40 00 00 74 4f d4 ff ff

disable 0x01:

Jun 12 10:55:56 steve-work-sb kernel: [ 3741.478768] surface_sam_ssh serial0-0: tx: sending data (length: 23)
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.478770] tx: 00000000: aa 55 80 0d 00 c6 23 a2 80 21 02 00 00 e8 00 02
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.478771] tx: 00000010: 15 00 15 00 01 d6 2a
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.480099] surface_sam_ssh serial0-0: rx: received data (size: 10)
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.480101] rx: 00000000: aa 55 40 00 00 c6 d6 53 ff ff
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.480103] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x40, len: 0)
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.480104] surface_sam_ssh serial0-0: rx: valid control message received (type: 0x40)
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.481667] surface_sam_ssh serial0-0: rx: received data (size: 19)
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.481669] rx: 00000000: aa 55 80 09 00 7a b4 18 80 21 00 02 00 e8 00 02
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.481670] rx: 00000010: 00 70 77
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.481671] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x80, len: 9)
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.481672] surface_sam_ssh serial0-0: rx: parser: valid command found (tc: 0x21, cid: 0x02)
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.481672] surface_sam_ssh serial0-0: rx: valid command message received
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.481704] surface_sam_ssh serial0-0: tx: sending data (length: 10)
Jun 12 10:55:56 steve-work-sb kernel: [ 3741.481705] tx: 00000000: aa 55 40 00 00 7a 81 35 ff ff

enable 0x03:

Jun 12 10:56:37 steve-work-sb kernel: [ 3782.588651] surface_sam_ssh serial0-0: tx: sending data (length: 23)
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.588653] tx: 00000000: aa 55 80 0d 00 cb 8e 73 80 21 02 00 00 ed 00 01
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.588654] tx: 00000010: 15 00 15 00 03 ca 8c
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.590040] surface_sam_ssh serial0-0: rx: received data (size: 10)
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.590042] rx: 00000000: aa 55 40 00 00 cb 7b 82 ff ff
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.590043] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x40, len: 0)
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.590044] surface_sam_ssh serial0-0: rx: valid control message received (type: 0x40)
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.591441] surface_sam_ssh serial0-0: rx: received data (size: 19)
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.591443] rx: 00000000: aa 55 80 09 00 80 e1 56 80 21 00 02 00 ed 00 01
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.591444] rx: 00000010: 00 66 9e
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.591446] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x80, len: 9)
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.591448] surface_sam_ssh serial0-0: rx: parser: valid command found (tc: 0x21, cid: 0x01)
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.591448] surface_sam_ssh serial0-0: rx: valid command message received
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.591460] surface_sam_ssh serial0-0: tx: sending data (length: 10)
Jun 12 10:56:37 steve-work-sb kernel: [ 3782.591461] tx: 00000000: aa 55 40 00 00 80 d4 7b ff ff

enable 0x01:

Jun 12 10:56:57 steve-work-sb kernel: [ 3802.051304] surface_sam_ssh serial0-0: tx: sending data (length: 23)
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.051306] tx: 00000000: aa 55 80 0d 00 cc 69 03 80 21 02 00 00 ee 00 01
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.051307] tx: 00000010: 15 00 15 00 01 fd 64
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.052607] surface_sam_ssh serial0-0: rx: received data (size: 10)
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.052608] rx: 00000000: aa 55 40 00 00 cc 9c f2 ff ff
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.052610] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x40, len: 0)
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.052610] surface_sam_ssh serial0-0: rx: valid control message received (type: 0x40)
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.054082] surface_sam_ssh serial0-0: rx: received data (size: 19)
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.054083] rx: 00000000: aa 55 80 09 00 81 c0 46 80 21 00 02 00 ee 00 01
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.054083] rx: 00000010: 00 ba 05
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.054084] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x80, len: 9)
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.054085] surface_sam_ssh serial0-0: rx: parser: valid command found (tc: 0x21, cid: 0x01)
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.054085] surface_sam_ssh serial0-0: rx: valid command message received
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.054091] surface_sam_ssh serial0-0: tx: sending data (length: 10)
Jun 12 10:56:57 steve-work-sb kernel: [ 3802.054093] tx: 00000000: aa 55 40 00 00 81 f5 6b ff ff
qzed commented 4 years ago

Here's the first test again, much less noisy now on external mouse/keyboard.

channel2_again.log

Okay, bit of an interesting thing to note here: The _BST (dynamic battery status) calls here indicate that the battery on channel 2 is discharging, whereas the battery on channel 1 is neither charging nor discharging after the first unplug event. The event indicating adapter change is only sent on channel 1. After plugging it back in both are neither charging nor discharging. Also note that the enable/disable calls don't work as already noted above.

I'd normally assume that this means that the events represent the AC being disconnected (and not the charging state changing), however, the log on the windows side (https://github.com/linux-surface/linux-surface/issues/201#issuecomment-635880484) shows that they can also occur on channel 2. Not sure what I should make of this. Weirdly enough, the channel 2 event happens 16 seconds after the channel 1 event, so they also don't seem to be connected in time.

Currently, the implementation notifies both kernel battery devices and the adapter of change, so we should probably stick with that. This implementation comes from the SB2, where the event only seems to happen for IID=1 and indicate charger connect/disconnect (n.b.: on the SB2, batteries are separated by IID instead of channel, also only ever channel 1 is used and there are no unsequenced messages).

The CID=0x53 event is, in the dmesg log, also only present for channel 1, but on the Windwos log it's present for both channels. I would have assumed that the event is being sent for the battery of which the status changes, which in this case is the one on channel 2.

I'll have a quick look at the Windows driver and see if I can spot anything there.

qzed commented 4 years ago

From what I can tell, the events are handled completely separate for each channel, which would indicate that they should be sent on both channels. Although Windows does seem to handle batteries a bit different than Linux (e.g. they don't seem to have a separate AC and battery device, but rather pack that together into one battery device).

It's interesting to note that they do filter by IID comparison, i.e. they compare the event IID to the IID of the battery device, for all events except CID=0x17 (adapter status change) and CID=0x53. This seems to reflect my testing on the SB2, where the adapter status change event is only sent for IID=1.

steveniemitz commented 4 years ago

whereas the battery on channel 1 is neither charging nor discharging after the first unplug event.

Could the channel discrepancies come from the fact that enable/disable commands for channel 1 don't work? (surface_sam_ssh serial0-0: rqst: communication timed out) or is that unrelated?

steveniemitz commented 4 years ago

also, out of curiosity, what are you using to disassemble the windows drivers? I had limited success w/ IDA.

qzed commented 4 years ago

whereas the battery on channel 1 is neither charging nor discharging after the first unplug event.

Could the channel discrepancies come from the fact that enable/disable commands for channel 1 don't work? (surface_sam_ssh serial0-0: rqst: communication timed out) or is that unrelated?

As far as I can see, the battery still uses the legacy system (https://github.com/linux-surface/surface-aggregator-module/issues/40#issuecomment-643315320). Could have to do something with the enable/disable calls. Maybe we need to set the IID parameter differently, maybe the events are just sent later or delayed for some reason. What you could do is unplug the AC and log for a while in the background (at least until there's a couple percent drop on both batteries). If the events are set up correctly, we should see events on both channels during that period.

also, out of curiosity, what are you using to disassemble the windows drivers? I had limited success w/ IDA.

I'm using ghidra, although my success and skills there are also a bit limited. I mostly start by searching strings, luckily that works fairly well as MS has built in quite a bit of debug logging. From that you can pretty much extract the function names and hop along the calls. In drivers other than the SurfaceSerialHub driver (i.e. SurfaceBattery for battery, SurfaceHidMini for HID/keyboard/touchpad), look for SamCommunicationViaSsh_SendSynchronously and SamCommunicationViaSsh_Send, those are basically the calls to the EC. Their signature is something like

int SamCommunicationViaSsh_SendSynchronously
              (sam_device *dev,byte *request,size_t request_len,byte *response,size_t response_cap,
              undefined4 param_6,uint ioctl_nr,undefined4 param_8,size_t *response_len)

the request bytes are

struct request {
    uint8_t target_category;
    uint8_t channel;
    uint8_t instance_id;
    uint8_t sync_flag;   // not sure
    uint8_t command_id;
    uint16_t payload_length;
    uint8_t payload[0];  // meaning payload is attached directly here, after the struct.
};

although they're sometimes accessed weirdly (i.e. via uint32_t assignment to some parts).

The problem where I always get stuck are virtual/indirect function calls. Unfortunately the whole event enable/disable thing is more or less behind that. They seem to have a base class (SamCommunicationViaSsh), which has a function to register for events/notifications (SamCommunicationViaSsh::RegisterForTargetNotification). This function calls a virtual function (lets' call it SamCommunicationViaSsh::GetEventRegistrationData) and I haven't been able to get behind that yet. This structure is present in all drivers.

steveniemitz commented 4 years ago

ok awesome, that's really helpful, thnaks!

I'll run the experiment you suggested above later today as well

qzed commented 4 years ago

After that, make sure that there are no power events any more

Confirmed

./scripts/evreg.py legacy-enable 0x01 0x02 0x00 0x00

Brought back power events (plug/unplug at least). That was the first I tried that did anything (the non-legacy enables didn't).

Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789893] surface_sam_ssh serial0-0: rx: received data (size: 57)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789898] rx: 00000000: aa 55 00 08 00 68 cf c0 80 02 00 01 01 02 00 17
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789899] rx: 00000010: 0f 7c aa 55 00 1d 00 69 7d 78 80 02 00 01 01 02
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789901] rx: 00000020: 00 53 01 84 02 01 00 92 9a 00 00 ac 00 10 20 92
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789902] rx: 00000030: e5 00 00 e4 25 00 00 62 b5
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789905] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x00, len: 8)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789907] surface_sam_ssh serial0-0: rx: parser: valid command found (tc: 0x02, cid: 0x17)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789916] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x00, len: 29)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789918] surface_sam_ssh serial0-0: rx: parser: valid command found (tc: 0x02, cid: 0x53)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789928] surface_sam_sid_ac surface_sam_sid_ac: power event (cid = 0x17)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789931] surface_sam_sid_battery surface_sam_sid_battery.1: power event (cid = 0x17)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789934] surface_sam_sid_battery surface_sam_sid_battery.2: power event (cid = 0x17)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789937] surface_sam_sid_ac surface_sam_sid_ac: power event (cid = 0x53)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789939] surface_sam_sid_battery surface_sam_sid_battery.1: power event (cid = 0x53)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789941] surface_sam_sid_battery surface_sam_sid_battery.1: unhandled power event (cid = 0x53)
Jun 12 10:53:36 steve-work-sb kernel: [ 3601.789942] surface_sam_sid_battery surface_sam_sid_battery.2: power event (cid = 0x53)

This is pretty neat. So first of all, the battery driver still seems to use the legacy system, and second, we have confirmation that the second byte in the event payload is indeed the SEQ flag. After this call, the events start with aa 55 00 instead of the aa 55 80 we've seen previously.

Would be awesome if you could make absolutely sure that we haven't triggered this via something other in that process. To do that, you can try disabling the events again. Either of

./scripts/evreg.py legacy-disable 0x01 0x02 0x00 0x00
./scripts/evreg.py legacy-disable 0x01 0x02 0x01 0x00

should work for that, I think (again you need to trigger them by un/plugging the power adapter, as you should do after each command below too). After that enable in unsequenced mode with

./scripts/evreg.py legacy-enable 0x01 0x02 0x00 0x00

and you should see events starting with aa 55 00. Disable them again via

./scripts/evreg.py legacy-disable 0x01 0x02 0x00 0x00

and enable them in sequenced mode via

./scripts/evreg.py legacy-enable 0x01 0x02 0x01 0x00

Now you should see events starting with aa 55 80.

qzed commented 4 years ago

Repeating that command should show those again.

Logs: repeat-enable.log

Okay, repeating the commands sends the events again, with the same payload. So there also doesn't seem to be any internal counter that keeps track of how many times the event has been enabled or something like that (which has been kind of implied by the test that there is no reference counting).

Here's the cleaned up data (payload separated by spaces, received event data only):

rx: aa 55 80 0d 00 b8 7a 3d 80 21 00 02 00 21 00 03  01 00 02 02 01  da 6e
rx: aa 55 80 0d 00 b6 b4 dc 80 21 00 02 00 21 00 03  01 00 02 03 01  eb 5d
rx: aa 55 80 0d 00 bc fe 7d 80 21 00 02 00 21 00 03  01 00 02 15 01  3e f4
rx: aa 55 80 0d 00 ba 38 1d 80 21 00 02 00 21 00 03  01 00 02 15 03  7c d4
rx: aa 55 80 0d 00 b7 95 cc 80 21 00 02 00 21 00 03  01 00 02 15 05  ba b4
rx: aa 55 80 0d 00 b9 5b 2d 80 21 00 02 00 21 00 03  01 00 02 15 06  d9 84
rx: aa 55 80 0d 00 bb 19 0d 80 21 00 02 00 21 00 03  01 00 02 21 00  4e 2d

the data is exactly the same for each call. As far as I can tell, the last two bytes are TC and IID.

qzed commented 4 years ago
# trigger REG events, test that keyboard and touchpad are is enabled
./scripts/evreg.py disable 0x02 0x15 0x00 0x03
# trigger REG events, test that touchpad is disabled and keyboard enabled
CONFIRMED!
./scripts/evreg.py disable 0x02 0x15 0x00 0x01
# trigger REG events, test that both touchpad and keyboard are disabled
CONFIRMED!
./scripts/evreg.py enable 0x02 0x15 0x00 0x01
# trigger REG events, test that touchpad is disabled and keyboard is enabled
CONFIRMED!
./scripts/evreg.py disable 0x02 0x15 0x00 0x03
# trigger REG events, test that both touchpad and keyboard are enabled
CONFIRMED

Okay, in addition to that, can you run ./scripts/evreg.py simple 0x21 0x03 0x00 0x02 0x01 before/after each call and upload the full logs? All in all:

./scripts/evreg.py enable 0x02 0x21 0x01 0x00       # enable events for TC=0x21 (REG)
./scripts/evreg.py simple 0x21 0x03 0x00 0x02 0x01  # trigger REG events
./scripts/evreg.py disable 0x02 0x15 0x00 0x03      # disable touchpad
./scripts/evreg.py simple 0x21 0x03 0x00 0x02 0x01  # trigger REG events
./scripts/evreg.py disable 0x02 0x15 0x00 0x01      # disable keyboard
./scripts/evreg.py simple 0x21 0x03 0x00 0x02 0x01  # trigger REG events
./scripts/evreg.py enable 0x02 0x15 0x00 0x01       # enable keyboard
./scripts/evreg.py simple 0x21 0x03 0x00 0x02 0x01  # trigger REG events
./scripts/evreg.py enable 0x02 0x15 0x00 0x03       # enable touchpad
./scripts/evreg.py simple 0x21 0x03 0x00 0x02 0x01  # trigger REG events

Hopefully that changes some stuff in the REG events being sent, so we can figure out what those mean/contain.

qzed commented 4 years ago

Okay, so a quick overview over what we've got so far:

Open questions:

qzed commented 4 years ago

To check on the last point above, can you try running

./scripts/evreg.py disable 0x02 0x15 0x00 0x00

if that's a disable-all call, it should disable both touchpad and keyboard.

Similarly, first disable all keyboard/touchpad events normally via

./scripts/evreg.py disable 0x02 0x15 0x00 0x01
./scripts/evreg.py disable 0x02 0x15 0x00 0x03

and then try to enable both simultaneously via

./scripts/evreg.py enable 0x02 0x15 0x00 0x00
qzed commented 4 years ago

Also thanks for your help and patience so far!

qzed commented 4 years ago
  • What's up with the battery events? They seem to be separated by channel, but we've only logged channel 1 events on Linux so far. Do we need special/other enable/disable calls or other parameters? There seems to be a TC=0x02 (battery/power subsystem) event in the TC=0x21 event payload. Maybe they are separated, i.e. enable channel 1 events via legacy call, enable channel 2 events via TC=0x21 call?

An addition to this: Windows separates event listeners by channel. Specifically, the listeners are added for the channel on which the request to enable the event has been made, so it does not look like it's a separate parameter. For example, events enabled via REG (TC=0x21) have their event handler stored for channel 2 (as tested above, REG does not work for channel 1).

Does

./scripts/evreg.py legacy-enable 0x02 0x02 0x01 0x00

work or does it time out?

qzed commented 4 years ago

Okay, neat. No need to test https://github.com/linux-surface/surface-aggregator-module/issues/40#issuecomment-643648683. I've been able to confirm that on my SB2. Not sure that I remember correctly, but I thought I've tried this before and didn't see any difference between SEQ being 0 or 1. Maybe a firmware update changed that.

NickyTope commented 4 years ago

Wow, this thread is a big read. not sure if I should try to catch up on running any/all commands above? huge thanks for your input on this @steveniemitz and @qzed you guys both obviously know lots more about this stuff than I do. Long time linux user here but I write javascript for a living so not used to such low level concepts.

qzed commented 4 years ago

@NickyTope We've figured out quite a bit so far. There are still a couple of things I'd love to see tested though, if you have the time (should be stand-alone so no need to read through all this if you don't want to). Specifically these three:

steveniemitz commented 4 years ago

Okay, in addition to that, can you run ./scripts/evreg.py simple 0x21 0x03 0x00 0x02 0x01 before/after each call and upload the full logs? All in all:

simple-events.log

I didn't generate any events on my own (plug/unplug, etc), did you want me to do that here as well? Or was running the script sufficient here?

steveniemitz commented 4 years ago

to check on the last point above, can you try running ./scripts/evreg.py disable 0x02 0x15 0x00 0x00

This actually disable neither the trackpad nor keyboard. :P Same deal w/ enable.

Jun 15 09:33:11 steve-work-sb kernel: [  938.163938] surface_sam_ssh serial0-0: tx: sending data (length: 23)
Jun 15 09:33:11 steve-work-sb kernel: [  938.163940] tx: 00000000: aa 55 80 0d 00 49 44 c2 80 21 02 00 00 6b 00 02
Jun 15 09:33:11 steve-work-sb kernel: [  938.163941] tx: 00000010: 15 00 15 00 00 03 0f
Jun 15 09:33:11 steve-work-sb kernel: [  938.165269] surface_sam_ssh serial0-0: rx: received data (size: 10)
Jun 15 09:33:11 steve-work-sb kernel: [  938.165271] rx: 00000000: aa 55 40 00 00 49 b1 33 ff ff
Jun 15 09:33:11 steve-work-sb kernel: [  938.165272] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x40, len: 0)
Jun 15 09:33:11 steve-work-sb kernel: [  938.165273] surface_sam_ssh serial0-0: rx: valid control message received (type: 0x40)
Jun 15 09:33:11 steve-work-sb kernel: [  938.166774] surface_sam_ssh serial0-0: rx: received data (size: 19)
Jun 15 09:33:11 steve-work-sb kernel: [  938.166776] rx: 00000000: aa 55 80 09 00 7e 30 58 80 21 00 02 00 6b 00 02
Jun 15 09:33:11 steve-work-sb kernel: [  938.166776] rx: 00000010: 01 b5 21
Jun 15 09:33:11 steve-work-sb kernel: [  938.166778] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x80, len: 9)
Jun 15 09:33:11 steve-work-sb kernel: [  938.166779] surface_sam_ssh serial0-0: rx: parser: valid command found (tc: 0x21, cid: 0x02)
Jun 15 09:33:11 steve-work-sb kernel: [  938.166779] surface_sam_ssh serial0-0: rx: valid command message received
Jun 15 09:33:11 steve-work-sb kernel: [  938.172388] surface_sam_ssh serial0-0: tx: sending data (length: 10)
Jun 15 09:33:11 steve-work-sb kernel: [  938.172390] tx: 00000000: aa 55 40 00 00 7e 05 75 ff ff
steveniemitz commented 4 years ago

Does ./scripts/evreg.py legacy-enable 0x02 0x02 0x01 0x00 work or does it time out?

It times out.

Jun 15 09:35:47 steve-work-sb kernel: [ 1093.426587] surface_sam_ssh serial0-0: tx: sending data (length: 23)
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.426589] tx: 00000000: aa 55 80 0d 00 5a 16 e0 80 21 02 00 00 7c 00 01
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.426589] tx: 00000010: 15 00 15 00 01 6e 2b
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.428211] surface_sam_ssh serial0-0: rx: received data (size: 10)
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.428215] rx: 00000000: aa 55 40 00 00 5a e3 11 ff ff
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.428216] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x40, len: 0)
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.428217] surface_sam_ssh serial0-0: rx: valid control message received (type: 0x40)
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.429712] surface_sam_ssh serial0-0: rx: received data (size: 19)
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.429714] rx: 00000000: aa 55 80 09 00 52 de bd 80 21 00 02 00 7c 00 01
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.429715] rx: 00000010: 00 4d 2e
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.429716] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x80, len: 9)
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.429717] surface_sam_ssh serial0-0: rx: parser: valid command found (tc: 0x21, cid: 0x01)
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.429718] surface_sam_ssh serial0-0: rx: valid command message received
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.429790] surface_sam_ssh serial0-0: tx: sending data (length: 10)
Jun 15 09:35:47 steve-work-sb kernel: [ 1093.429792] tx: 00000000: aa 55 40 00 00 52 eb 90 ff ff
Jun 15 09:35:54 steve-work-sb kernel: [ 1101.048477] surface_sam_ssh serial0-0: tx: sending data (length: 23)
Jun 15 09:35:54 steve-work-sb kernel: [ 1101.048479] tx: 00000000: aa 55 80 0d 00 5b 37 f0 80 01 02 00 00 7d 00 0b
Jun 15 09:35:54 steve-work-sb kernel: [ 1101.048479] tx: 00000010: 02 01 02 00 00 f7 47
Jun 15 09:35:54 steve-work-sb kernel: [ 1101.049848] surface_sam_ssh serial0-0: rx: received data (size: 10)
Jun 15 09:35:54 steve-work-sb kernel: [ 1101.049850] rx: 00000000: aa 55 40 00 00 5b c2 01 ff ff
Jun 15 09:35:54 steve-work-sb kernel: [ 1101.049851] surface_sam_ssh serial0-0: rx: parser: valid frame found (type: 0x40, len: 0)
Jun 15 09:35:54 steve-work-sb kernel: [ 1101.049851] surface_sam_ssh serial0-0: rx: valid control message received (type: 0x40)
Jun 15 09:35:55 steve-work-sb kernel: [ 1102.052302] surface_sam_ssh serial0-0: rqst: communication timed out
qzed commented 4 years ago

Okay, in addition to that, can you run ./scripts/evreg.py simple 0x21 0x03 0x00 0x02 0x01 before/after each call and upload the full logs? All in all:

simple-events.log

I didn't generate any events on my own (plug/unplug, etc), did you want me to do that here as well? Or was running the script sufficient here?

Running the script is sufficient here. Although, that didn't really behave like I thought it would. Can you try

./scripts/evreg.py enable 0x02 0x21 0x01 0x00       # enable events for TC=0x21 (REG)
./scripts/evreg.py disable 0x02 0x15 0x00 0x03      # disable touchpad
./scripts/evreg.py enable 0x02 0x21 0x01 0x00       # enable events for TC=0x21 (REG)
./scripts/evreg.py disable 0x02 0x15 0x00 0x01      # disable keyboard
./scripts/evreg.py enable 0x02 0x21 0x01 0x00       # enable events for TC=0x21 (REG)
./scripts/evreg.py enable 0x02 0x15 0x00 0x01       # enable keyboard
./scripts/evreg.py enable 0x02 0x21 0x01 0x00       # enable events for TC=0x21 (REG)
./scripts/evreg.py enable 0x02 0x15 0x00 0x03       # enable touchpad
./scripts/evreg.py enable 0x02 0x21 0x01 0x00       # enable events for TC=0x21 (REG)
qzed commented 4 years ago

Based on the other two, it looks like:

steveniemitz commented 4 years ago

Running the script is sufficient here. Although, that didn't really behave like I thought it would. Can you try

Attached more-events.log

qzed commented 4 years ago

Thanks! The events returned are the same every time (including sent in the same order), so that rules out that they indicate the currently enabled events. Maybe they indicate the events that can be enabled. No clue how to test that.

So I think the only thing we've left to figure out are the battery events on channel 2.

Before I forget: We should probably also make sure that the DTX events are still handled via TC=0x01 (i.e. the legacy system) so that we don't run into any latent issues there. Can you run

./scripts/evreg.py legacy-disable 0x01 0x11 0x01 0x00

and try to detach? That should cause the detaching / latch open to be significantly delayed (i.e. as if there were no DTX driver/daemon). Running

./scripts/evreg.py legacy-enable 0x01 0x11 0x01 0x00

after that should enable the events again and detaching should work with normal speed. Although I doubt they've changed anything here.

steveniemitz commented 4 years ago

Before I forget: We should probably also make sure that the DTX events are still handled via TC=0x01 (i.e. the legacy system) so that we don't run into any latent issues there. Can you run

The DTX daemon does work correctly (I get attach/detach notifications), although oddly enough it seems just as fast (which is 1-2 seconds) with events disabled and enabled. I also get notifications with events both enabled and disabled here.

qzed commented 4 years ago

Hmm, interesting. For me on the SB2, there's a clear difference, so that needs some more investigation. I'll have a look at the Windows driver. Maybe I'll figure out something there.

steveniemitz commented 4 years ago

yeah I think this is weird too, it might make sense to have someone else confirm what I saw too, maybe @NickyTope if you have time? (wrt commet https://github.com/linux-surface/surface-aggregator-module/issues/40#issuecomment-644218433)

steveniemitz commented 4 years ago

wrt the battery events, I noticed that, in windows, one battery shows up on the surface integration driver (SID), {BFD21D0B-D701-435B-AF19-4A20677898D3}\SURFACEBATTERY\3&ECB3501&0&SID_DEVICE_00

while the other shows up under the hotplug driver (SHP) {BFD21D0B-D701-435B-AF19-4A20677898D3}\TARGET_KIP&CATEGORY_BATTERY\3&2C2A152A&0&SHP_DEVICE_01

Not sure if this is how it was on the SB2 as well.

qzed commented 4 years ago

On the SB2, both batteries are handled via an ACPI shim (SurfaceAcpiNotify driver, basically a translation layer that translates ACPI calls to SAM requests and back), so to Windows (and Linux) they look like standard ACPI batteries. Only the SP7, SL3 and SB3 use the newer battery driver. Although the SAM interface itself is mostly the same (apart from event enabling/disabling and the change that they separate the battery via channel instead of instance ID).

Could be that events on channel 2 are enabled via KIP (TC=0x0e). I've added some new commands to the evreg.py script for KIP events (found in the HID driver, may not work at all). Maybe

./scripts/evreg.py kip-enable 0x02 0x02 0x01 0x00

or

./scripts/evreg.py kip-enable 0x02 0x02 0x01 0x01

(to enable) and

./scripts/evreg.py kip-disable 0x02 0x02 0x01 0x00

or

./scripts/evreg.py kip-disable 0x02 0x02 0x01 0x01

(to disable) do something? You did try the same with plain enable/disable calls already, right? Specifically that with IID, i.e. the last parameter, being 0x01.

NickyTope commented 4 years ago

yeah I think this is weird too, it might make sense to have someone else confirm what I saw too, maybe @NickyTope if you have time? (wrt commet #40 (comment))

Can confirm using the legacy-disable as above does not decrease the time to detach. It is around 1 second in both instances with the latest modules installed

NickyTope commented 4 years ago

if that's a disable-all call, it should disable both touchpad and keyboard.

calling ./scripts/evreg.py (dis)enable 0x02 0x15 0x00 0x00 had no effect for me.

The events disabled perfectly with ./scripts/evreg.py disable 0x02 0x15 0x00 0x01 which you probably expected!

qzed commented 4 years ago

Thanks! I think we now only need to figure out how the second battery and the detach events are enabled/disabled.

qzed commented 3 years ago

Based on the log from https://github.com/linux-surface/linux-surface/issues/201#issuecomment-655171238, the battery events on channel 2 are enabled/disabled via the KIP event registry. It also kind of looks like all devices on channel 2 are connected to the SB3 base, whereas devices on channel 1 are connected to the clipboard. Might be coincidence though.

qzed commented 3 years ago

I've opened a new issue for the batteries specifically: https://github.com/linux-surface/surface-aggregator-module/issues/44.