Open tmarkov opened 3 years ago
This is perhaps separate issue also resulting in touch hang. I figured I'll put it here in case they're connected. In this case, iptsd crashes with
EOF
github.com/linux-surface/iptsd/protocol.(*IptsProtocol).ReadByte
github.com/linux-surface/iptsd/protocol/protocol.go:48
github.com/linux-surface/iptsd/protocol.(*IptsProtocol).ReadShort
github.com/linux-surface/iptsd/protocol/protocol.go:55
github.com/linux-surface/iptsd/protocol.(*IptsProtocol).ReadPayloadFrame
github.com/linux-surface/iptsd/protocol/payload.go:59
main.IptsPayloadHandleInput
github.com/linux-surface/iptsd/payload.go:14
main.IptsDataHandleInput
github.com/linux-surface/iptsd/data.go:15
main.main
github.com/linux-surface/iptsd/main.go:67
runtime.main
runtime/proc.go:203
runtime.goexit
runtime/asm_amd64.s:1373
No logs from the driver.
[ Sorry for not coming back to this earlier, was a bit busy irl ]
Could you try this? I.e. instead of removing the check entirely, let it warn, but then return without an error. This is probably more useful, since an error returned by ipts_control_send
stops the whole driver, which also kills iptsd.
diff --git a/control.c b/control.c
index 857bcf4..c2754aa 100644
--- a/control.c
+++ b/control.c
@@ -24,11 +24,11 @@ int ipts_control_send(struct ipts_context *ipts,
if (ret >= 0)
return 0;
+ dev_err(ipts->dev, "MEI error while sending: 0x%X:%d\n", cmd, ret);
+
if (cmd == IPTS_CMD(FEEDBACK) && ret == -IPTS_ME_STATUS_NOT_READY)
return 0;
- dev_err(ipts->dev, "MEI error while sending: 0x%X:%d\n", cmd, ret);
-
return ret;
}
This will most likely spam your dmesg the entire time while you are using it. What I am interested in is whether the spam will continue after touch started hanging (and what error code is returned).
The iptsd error looks like it received an invalid data buffer, which is weird, but probably not related to touch hang.
OK, got a hang with the new patch. Here's the journal log:
aug 31 19:04:50 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:04:52 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:05:17 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:05:33 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:05:48 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:05:56 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:05:56 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:05:56 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:05:57 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:05:59 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:05:59 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:06:07 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:06:09 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:06:22 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
aug 31 19:06:28 tmarkov-surface kernel: ipts 0000:00:16.4-3e8d0870-271a-4208-8eb5-9acb9402ae04: MEI error while sending: 0x6:-4
There was only one hang, though. As such, it's not clear whether the last of these events actually has any relation to it, since the first 14 clearly don't. I'll combine that with printing doorbell to determine.
I can confirm that there's NO MEI error while sending
when the hang happens (doorbell stops increasing) - so it's something else.
I did some further investigation, but can you offer some clarifications:
I added debug output to IptsControl.SendFeedback
function in iptsd:
func (ipts *IptsControl) SendFeedback() error {
fmt.Println("Sending feedback")
return ipts.SendFeedbackFile(ipts.CurrentFile())
}
I also added doorbell prints in the main cycle in main.go
.
On the driver side, I added some debug outputs is the feedback functions as well.
When IPTS hanged, I stopped iptsd without unloading the driver.
After launching a new instance if iptsd (after stopping the running one), I get the following output from it:
Connected to device 1b96:005e
doorbell 33162 <nil>
doorbell 33162 <nil>
doorbell 33162 <nil>
doorbell 33162 <nil>
doorbell 33162 <nil>
doorbell 33162 <nil>
[...]
So no calls to IptsControl.SendFeedback
and the doorbell stays the same.
On the other hand, in dmesg
I get new entries as follows:
[16127.124212] IPTS: Sending feedback, doorbell 2891486536
[16127.124216] mei_cldev_send, 6
[16127.135814] return from mei_cldev_send
[16127.135815] IPTS: Feedback sent
[16127.135823] IPTS: Sending feedback, doorbell 2891486536
[16127.135824] mei_cldev_send, 6
[16127.136042] return from mei_cldev_send
[16127.136043] IPTS: Feedback sent
[16127.136048] IPTS: Sending feedback, doorbell 2891486536
[16127.136049] mei_cldev_send, 6
[16127.136376] return from mei_cldev_send
[16127.136377] IPTS: Feedback sent
[16127.136382] IPTS: Sending feedback, doorbell 2891486536
[16127.136382] mei_cldev_send, 6
[16127.136556] return from mei_cldev_send
[16127.136557] IPTS: Feedback sent
[16127.136563] IPTS: Sending feedback, doorbell 2891486536
[16127.136563] mei_cldev_send, 6
[16127.136741] return from mei_cldev_send
[16127.136743] IPTS: Feedback sent
[16127.136757] IPTS: Sending feedback, doorbell 2891486536
[16127.136758] mei_cldev_send, 6
[16127.137147] return from mei_cldev_send
[16127.137150] IPTS: Feedback sent
[16127.137183] IPTS: Sending feedback, doorbell 2891486536
[16127.137188] mei_cldev_send, 6
[16127.137669] return from mei_cldev_send
[16127.137738] IPTS: Feedback sent
[16127.137778] IPTS: Sending feedback, doorbell 2891486536
[16127.137780] mei_cldev_send, 6
[16127.138238] return from mei_cldev_send
[16127.138240] IPTS: Feedback sent
[16127.138265] IPTS: Sending feedback, doorbell 2891486536
[16127.138266] mei_cldev_send, 6
[16127.138656] return from mei_cldev_send
[16127.138659] IPTS: Feedback sent
[16127.138679] IPTS: Sending feedback, doorbell 2891486536
[16127.138680] mei_cldev_send, 6
[16127.138955] return from mei_cldev_send
[16127.138958] IPTS: Feedback sent
[16127.138978] IPTS: Sending feedback, doorbell 2891486536
[16127.138978] mei_cldev_send, 6
[16127.139266] return from mei_cldev_send
[16127.139268] IPTS: Feedback sent
[16127.139286] IPTS: Sending feedback, doorbell 2891486536
[16127.139287] mei_cldev_send, 6
[16127.139459] return from mei_cldev_send
[16127.139461] IPTS: Feedback sent
[16127.139489] IPTS: Sending feedback, doorbell 2891486536
[16127.139495] mei_cldev_send, 6
[16127.139738] return from mei_cldev_send
[16127.139740] IPTS: Feedback sent
[16127.139778] IPTS: Sending feedback, doorbell 2891486536
[16127.139788] mei_cldev_send, 6
[16127.139808] return from mei_cldev_send
[16127.139809] IPTS: Feedback sent
[16127.139847] IPTS: Sending feedback, doorbell 2891486536
[16127.139848] mei_cldev_send, 6
[16127.139928] return from mei_cldev_send
[16127.139930] IPTS: Feedback sent
[16127.139939] IPTS: Sending feedback, doorbell 2891486536
[16127.139940] mei_cldev_send, 6
[16127.140132] return from mei_cldev_send
[16127.140135] IPTS: Feedback sent
[16127.141196] input: IPTS Touch as /devices/virtual/input/input45
[16127.141391] input: IPTS Stylus as /devices/virtual/input/input46
Showing that somehow on the driver side the feedback function gets called several times. This doesn't cause ipts to 'unhang'.
I double-checked by adding an extra send feedback call before the iptsd main cycle:
timeout := time.Now().Add(5 * time.Second)
fmt.Println("Trying feedback")
err = ipts.Control.SendFeedback()
fmt.Println("Error", err)
for {
doorbell, err := ipts.Control.Doorbell()
fmt.Println("doorbell", doorbell, " ", err)
if err != nil {
HandleError(ipts, err)
which returned
Trying feedback
Sending feedback
Error <nil>
from iptsd, and
[...]
[16788.399971] input: IPTS Touch as /devices/virtual/input/input49
[16788.400699] input: IPTS Stylus as /devices/virtual/input/input50
[16788.402237] IPTS: Sending feedback, doorbell 2891486536
[16788.402241] mei_cldev_send, 6
[16788.402343] return from mei_cldev_send
[16788.402345] IPTS: Feedback sent
from dmesg.
This extra call also didn't fix the hang.
Do you have suggestions what else could be worth looking at?
The feedback commands from your dmesg come from the flush that iptsd does on startup (it sends feedback for every buffer once to clear any old data). If you move your print from SendFeedback
to SendFeedbackFile
in iptsd, you should get logs for these too.
Since feedback gets through, but the doorbell doesn't increase, I think this is probably an issue in the ME firmware. It simply dies below our feet.
Btw, does a module reload fix the issue? Or do you have to restart?
Module reload fixes the issue.
@kitakar5525 do you think whatever was causing the hang with the old driver could be related (though it's not very likely, since I didn't have problems with the old driver).
Interestingly, I've never encountered any crashes (except suspend that fixed recently) on this ipts driver (that don't use guc submission) on my SB1.
It's true that there are some not known differences between yours and mine. What's your firmware version now?
# Print your system info:
# - do not use sudo so that it doesn't contain personal data
# - ignoring errors
grep . /sys/class/dmi/id/* 2>/dev/null
/sys/class/dmi/id/bios_date:03.24.2020
/sys/class/dmi/id/bios_vendor:Microsoft Corporation
/sys/class/dmi/id/bios_version:92.3192.768
/sys/class/dmi/id/board_name:Surface Book
/sys/class/dmi/id/board_vendor:Microsoft Corporation
/sys/class/dmi/id/chassis_type:9
/sys/class/dmi/id/chassis_vendor:Microsoft Corporation
/sys/class/dmi/id/modalias:dmi:bvnMicrosoftCorporation:bvr92.3192.768:bd03.24.2020:svnMicrosoftCorporation:pnSurfaceBook:pvr124000000000000000000000D0B09F1C03P38:rvnMicrosoftCorporation:rnSurfaceBook:rvr:cvnMicrosoftCorporation:ct9:cvr:
/sys/class/dmi/id/product_family:Surface
/sys/class/dmi/id/product_name:Surface Book
/sys/class/dmi/id/product_sku:Surface_Book
/sys/class/dmi/id/product_version:124000000000000000000000D:0B:09F:1C:03P:38
/sys/class/dmi/id/sys_vendor:Microsoft Corporation
/sys/class/dmi/id/uevent:MODALIAS=dmi:bvnMicrosoftCorporation:bvr92.3192.768:bd03.24.2020:svnMicrosoftCorporation:pnSurfaceBook:pvr124000000000000000000000D0B09F1C03P38:rvnMicrosoftCorporation:rnSurfaceBook:rvr:cvnMicrosoftCorporation:ct9:cvr:
@kitakar5525 Here's what I have - it's older. I'm skeptical that this is what causes the problem but perhaps it's time to update anyway.
/sys/class/dmi/id/bios_date:04/18/2019
/sys/class/dmi/id/bios_vendor:Microsoft Corporation
/sys/class/dmi/id/bios_version:91.2706.768
/sys/class/dmi/id/board_name:Surface Book
/sys/class/dmi/id/board_vendor:Microsoft Corporation
/sys/class/dmi/id/chassis_type:9
/sys/class/dmi/id/chassis_vendor:Microsoft Corporation
/sys/class/dmi/id/modalias:dmi:bvnMicrosoftCorporation:bvr91.2706.768:bd04/18/2019:svnMicrosoftCorporation:pnSurfaceBook:pvrD0B08F1C03P38:rvnMicrosoftCorporation:rnSurfaceBook:rvr:cvnMicrosoftCorporation:ct9:cvr:
/sys/class/dmi/id/product_family:Surface
/sys/class/dmi/id/product_name:Surface Book
/sys/class/dmi/id/product_sku:Surface_Book
/sys/class/dmi/id/product_version:D:0B:08F:1C:03P:38
/sys/class/dmi/id/sys_vendor:Microsoft Corporation
/sys/class/dmi/id/uevent:MODALIAS=dmi:bvnMicrosoftCorporation:bvr91.2706.768:bd04/18/2019:svnMicrosoftCorporation:pnSurfaceBook:pvrD0B08F1C03P38:rvnMicrosoftCorporation:rnSurfaceBook:rvr:cvnMicrosoftCorporation:ct9:cvr:
Hmm, the firmware version seems new enough...
I would occasionally get hangs when using the uapi driver with iptsd. Nothing is outputted to journal when the hang happens.
When printing the doorbell to dmesg, it stops increasing when the hang happens.
When I comment out https://github.com/linux-surface/intel-precise-touch/blob/master/control.c#L27, the driver crashes very quickly. Here's the journal: