Closed kitakar5525 closed 5 years ago
The problem is that we don't officialy have access to the mei bus device, which means I have to include
"../mei/mei_dev.h
inipts-gfx.c
.
We can actually use the parent device, that's a bit better (https://elixir.bootlin.com/linux/v5.2.2/source/drivers/misc/mei/bus.c#L912).
With that, suspend works with a3a3ed3 applied and https://github.com/qzed/linux-surface/commit/03b9074ef5affe5cef4783f0e57253f070e600d4 reverted. I still get the warning from https://github.com/jakeday/linux-surface/issues/544#issuecomment-514658312 though, so we'll need to find another way to fix that.
Edit: I have rebased/updated the commits in https://github.com/qzed/linux-surface/tree/v5.2, specifically, I have dropped https://github.com/qzed/linux-surface/commit/03b9074ef5affe5cef4783f0e57253f070e600d4 and fixed the device link to use the parent/bus device.
Okay, so it seems that I've missed a path before, and i915_drm_prepare
also sets guc->send = intel_guc_send_nop
. Given that the runtime-suspend function shouldn't get called during a normal suspend (but the prepare does) and the prepare shouldn't get called during runtime-suspend, we should probably be safe to move intel_ipts_suspend
over there.
Let's hope this doesn't break the suspend order fixed by the device link.
Yeah, breaks it. I need to unload the module again...
At this point I propose we revert a3a3ed3 and deal with problems that arise from this directly. I've been running without that commit for a while now on my main kernel and I haven't experienced any issues, so I guess it's working fine on the SB2. You also mentioned you were running without it on the SB1, right? So Pros and Laptops are currently unknown.
I also suggest we keep the device link as that may fix any of those issues and I think it's a good idea to have the dependency ensured on the PM side.
At this point I propose we revert a3a3ed3 and deal with problems that arise from this directly.
Yes, I think reverting the commit is the simplest option. If we will revert it, I suggest including my debugfs patches (https://github.com/jakeday/linux-surface/issues/544#issuecomment-514665973) to manually call ipts_stop/ipts_start and intel_ipts_cleanup/intel_ipts_init so we can add it to systemd/system-sleep/sleep script.
Thank you for your work!
You also mentioned you were running without it on the SB1, right?
Yes, without the commit and without removing intel_ipts module and no particular issue for a long time at least on suspend. (I still occasionally encounter an entire system freeze maybe caused by ipts_stop or something, this is another issue.)
If we will revert it, I suggest including my debugfs patches (#544 (comment)) to manually call ipts_stop/ipts_start and intel_ipts_cleanup/intel_ipts_init so we can add it to systemd/system-sleep/sleep script.
Adding the debugfs patches seems like a good idea, this should allow prototyping workarounds as user-space scripts in case that becomes necessary. I wouldn't add it to the systemd/system-sleep/sleep
script for everyone, only keep this as option/recommendation for the individual users in case there are any issues until it is fixed in kernel.
Thank you for your work!
Thank you for helping out and testing!
(I still occasionally encounter an entire system freeze maybe caused by ipts_stop or something, this is another issue.)
If you encounter this during suspend/resume you could try to unload/load the intel_ipts
module (to try and reproduce the issue), as that's basically what's happening on the module side. Also: Have you had this happen with the fixed device link implementation? I kind of hoped that this would be something like the intel_ipts
module accessing the already suspended i915
driver, which should be fixed with the device link.
I've added your patches to my 5.2 branch and I've also set ENABLE_IPTS_DEBUG
.
Eventually, we should back-port the changes to 4.19, but I think we should focus on #374 before we do that, might save us some work. Also 5.3 seems like it's going to take a bit of work again.
Also: Have you had this happen with the fixed device link implementation? I kind of hoped that this would be something like the intel_ipts module accessing the already suspended i915 driver, which should be fixed with the device link.
tested with
I did a stress test for intel_ipts module removing and system suspend.
- intel_ipts module remove/insert 30 times ```bash #!/bin/bash loop=0 while true; do modprobe -r intel_ipts echo "ipts removed" > /dev/kmsg sleep 3 # [1] modprobe intel_ipts echo "ipts inserted" > /dev/kmsg echo "loop: $((loop+=1)) done" > /dev/kmsg if [ $loop -eq 30 ]; then break fi echo "interrupt here to stop this stress-test"; sleep 5 done ```
- suspend/resume 30 times ```bash #!/bin/bash PATH_CPULPI_US=/sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us PATH_SLPS0_US=/sys/kernel/debug/pmc_core/slp_s0_residency_usec loop=0 while true; do bash /usr/lib/systemd/system-sleep/* pre rtcwake -m freeze -s 15 bash /usr/lib/systemd/system-sleep/* post echo "$(basename $PATH_CPULPI_US): $(cat $PATH_CPULPI_US) usec" > /dev/kmsg echo "$(basename $PATH_SLPS0_US): $(cat $PATH_SLPS0_US) usec" > /dev/kmsg echo "loop: $((loop+=1)) done" > /dev/kmsg if [ $loop -eq 30 ]; then break fi echo "interrupt here to stop this stress-test"; sleep 5 done ```
I did two sets for each test and passed the tests without freezing the system.
I noticed that when I comment out the first sleep 3
in ipts-stress-test-remove-insert.sh
[1] will cause a system freeze with a similar journal output as my previous comment https://github.com/jakeday/linux-surface/issues/544#issuecomment-513531566.
I used to use this one-liner to reload intel_ipts module:
# bad example
sudo modprobe -r intel_ipts && sudo modprobe intel_ipts
and I think re-loading the module too fast may be the cause. I should instead insert some sleep:
# good example
sudo modprobe -r intel_ipts && sleep 3 && sudo modprobe intel_ipts
For suspend, I used to set resume_delay to 0 by sysctl kernel.resume_delay=0
. On the other hand, the default value on jakeday kernel is 3000.
https://github.com/jakeday/linux-surface/blob/3d0abed6c461fd269694b66b9bb6372be230fa20/patches/5.1/0002-suspend.patch#L101
I will see what happens when I set resume_delay to 3000 (although passed the test 2 times).
@kitakar5525 Awesome work!
I noticed that when I comment out the first
sleep 3
inipts-stress-test-remove-insert.sh
[1] will cause a system freeze with a similar journal output as my previous comment #544 (comment).
That's interesting! I'll try to find the cause of that.
For suspend, I used to set resume_delay to 0 by
sysctl kernel.resume_delay=0
. On the other hand, the default value on jakeday kernel is 3000.
I've actually reverted that change at the moment. I don't think that the delay should cause any issues with IPTS, I'd rather expect it to make things a bit more stable. Nevertheless, testing both with and without it is probably a good idea.
I added destroy_doorbell() into i915_guc_ipts_submission_disable(). Now ipts_stop && intel_ipts_cleanup is working without causing GPU hang also on 4.19.
```diff From dcbb2d2c3a3d4f22f91bbbc006989338fd556ab9 Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Fri, 2 Aug 2019 08:00:10 +0900 Subject: [PATCH] i915: ipts: add destroy_doorbell() when disabling ipts guc submission Adding destroy_doorbell() into i915_guc_ipts_submission_disable() corresponds to __guc_client_disable() on 5.0+ --- drivers/gpu/drm/i915/intel_guc_submission.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c index 32a9d29c5..4cbc75fe3 100644 --- a/drivers/gpu/drm/i915/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/intel_guc_submission.c @@ -1408,6 +1408,7 @@ void i915_guc_ipts_submission_disable(struct drm_i915_private *dev_priv) if (!guc->ipts_client) return; + destroy_doorbell(guc->ipts_client); guc_client_free(guc->ipts_client); guc->ipts_client = NULL; } -- 2.22.0 ```
I noticed that calling intel_ipts_cleanup
too early tend to cause the two WARN_ONs (WARN_ON(i915_vma_unbind(vma))
and WARN_ON(i915_gem_object_has_pinned_pages(obj))
) I mentioned before (https://github.com/jakeday/linux-surface/issues/544#issuecomment-514985562).
I need to insert some sleep here, too:
sudo su -c "echo 1 > /sys/kernel/debug/ipts/ipts_stop"
sleep 3
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_cleanup"
I will update my old comments. https://github.com/jakeday/linux-surface/issues/544#issuecomment-514665973 https://github.com/jakeday/linux-surface/issues/544#issuecomment-514985562 https://github.com/jakeday/linux-surface/issues/544#issuecomment-515511237
Also 5.3 seems like it's going to take a bit of work again.
Yes, I cannot make IPTS work on 5.3-rc yet. My porting here.
I cannot find the appropriate replacement of intel_context_pin
.
@kitakar5525 Sorry for taking so long to answer this. The change for 4.19 makes sense, nice work spotting that!. In 5.2 that's apparently handled by the __guc_client_disable
function, so this explains why it's working there.
I'm not sure when I'm able to look at 5.3, but I'll try to before the official release.
@kitakar5525
I cannot make IPTS work on 5.3-rc yet. My porting here. I cannot find the appropriate replacement of
intel_context_pin
.
I think the replacements for create_ipts_context
should be i915_gem_context_get_engine
and intel_context_pin
. For intel_context_lookup
in destroy_ipts_context
I think it should be i915_gem_context_lookup_engine
. Also we need to add intel_context_put
due to reference counting.
I've got another problem now: ida_simple_get
in guc_client_alloc
fails with -ENOSPC
. Due to this i915_guc_ipts_submission_enable
fails.
Here's the patch for v5.2 directly ported to v5.3 (warning: with compilation issues), and below are my current changes:
```diff
From 815eda14e709abae9aa6a83f5e184f920efd494c Mon Sep 17 00:00:00 2001
From: qzed
@kitakar5525
I noticed that calling
intel_ipts_cleanup
too early tend to cause the two WARN_ONs (WARN_ON(i915_vma_unbind(vma))
andWARN_ON(i915_gem_object_has_pinned_pages(obj))
) I mentioned before (#544 (comment)). I need to insert some sleep here, too:
I think the issue here could be that the ME communication has not finished, i.e. ipts_stop
initiates ME communication but does not wait for it to finish. So we need to wait for communication to finish before calling intel_ipts_cleanup
. Unloading the module does that, as it waits until ipts_mei_cl_event_thread
exits.
Regarding i915_guc_ipts_submission_enable
, I (and you also) noticed the function does not return a true reason of guc_client_alloc
failure.
This change corresponds to `guc_clients_create` ```diff From 6a4c9384e3e7040c2982d6219b2c666c98e50f9b Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Fri, 2 Aug 2019 05:02:14 +0900 Subject: [PATCH] i915: ipts: return a true reason of guc_client_alloc failure --- drivers/gpu/drm/i915/intel_guc_submission.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c index 072fea44c..0a9a23715 100644 --- a/drivers/gpu/drm/i915/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/intel_guc_submission.c @@ -1502,7 +1502,7 @@ int i915_guc_ipts_submission_enable(struct drm_i915_private *dev_priv, ctx); if (IS_ERR(client)) { DRM_ERROR("Failed to create normal GuC client!\n"); - return -ENOMEM; + return PTR_ERR(client); } guc->ipts_client = client; -- 2.22.0 ```
Yeah, not sure why they did that with -ENOMEM
. We should probably change that.
For some reason intel_guc_submission_init
is not getting called. Via this guc->stage_ids
does not get initialized, so naturally ida_simple_get
with that as argument fails.
Edit: Seems like IPTS will not work on 5.3 at all... (via https://github.com/torvalds/linux/blob/v5.3-rc3/drivers/gpu/drm/i915/intel_uc.c#L313-L314):
int intel_uc_init(struct drm_i915_private *i915)
{
// ...
/* XXX: GuC submission is unavailable for now */
GEM_BUG_ON(USES_GUC_SUBMISSION(i915));
// ...
}
So that's kind of disappointing...
I've also pushed a Linux repo where I'll be working on IPTS and Surface related stuff, if you're interested: https://github.com/qzed/linux-surface-kernel. My current plan is to have the ...-surface
branches to directly reflect the linux-surface repo structure and other branches for development/customization.
I didn't even come to think that GuC submission will not be allowed on 5.3. No wonder IPTS is not working.
Thanks for the links, good to know the corresponding commits! I also didn't think that at first, just followed the calls for intel_guc_submission_init
wondering why it wasn't getting called. Guess that makes sense if the FW changes are that significantly.
intel_ipts_cleanup
/intel_ipts_init
on sleep script (not recommended now)Try this only if you have suspend/hibernate issue
We can still call intel_ipts_cleanup
/intel_ipts_init
on suspend/hibernate using systemd/system-sleep/sleep if you want:
case $1/$2 in
pre/*)
# Remove IPTS from ME side
modprobe -r intel_ipts
modprobe -r mei_hdcp
modprobe -r mei_me
modprobe -r mei
# Remove IPTS from i915 side
for i in $(find /sys/kernel/debug/dri -name i915_intel_ipts_cleanup); do echo 1 > $i; done
;;
post/*)
# Load IPTS from i915 side
for i in $(find /sys/kernel/debug/dri -name i915_intel_ipts_init); do echo 1 > $i; done
# Load IPTS from ME side
modprobe mei
modprobe mei_me
modprobe mei_hdcp
modprobe intel_ipts
;;
esac
using find
command to find the path to i915_intel_ipts_cleanup/i915_intel_ipts_init
the path is not necessarily /sys/kernel/debug/dri/0/i915_intel_ipts_cleanup
and /sys/kernel/debug/dri/0/i915_intel_ipts_init
Wildcard is not working on the sleep
script (?)
journal log says /sys/kernel/debug/dri/*/i915_intel_ipts_cleanup: No such file or directory
reloading mei
modules is required because these modules will load intel_ipts
module (thus ipts_start
will be called) before intel_ipts_init
get called.
We cannot use ipts_stop
/ipts_start
instead of reloading intel_ipts
module because we need to reload mei
modules. However, intel_ipts
depends on mei
.
I noticed mei_hdcp
module exists and depends on mei
on 5.2.
However, current systemd/system-sleep/sleep script does not include the mei_hdcp
module. Thus, removing mei
will fail. We have to properly remove all the dependency of mei
module if you want to use intel_ipts_cleanup
/intel_ipts_init
.
Right, although I'd vote that we drop IPTS/MEI from the sleep script for now. If there are any problems we can gradually add them back in until we fix the real problem.
Right, although I'd vote that we drop IPTS/MEI from the sleep script for now. If there are any problems we can gradually add them back in until we fix the real problem.
Yes, I think so, too. That is an investigation how to use intel_ipts_cleanup
/intel_ipts_init
.
I noticed mei_hdcp
module exists and depends on mei
on 5.2. However, current systemd/system-sleep/sleep script does not include the mei_hdcp
module. Thus, removing mei
will fail.
We have to properly remove all the dependency of mei
module if you want to use intel_ipts_cleanup
/intel_ipts_init
.
Right, good points! Thanks! I'll update the systemd-sleep script in my 5.2 branch with your findings!
We may want to automatically detect modules which are using mei
. However, I could not find an easier way. What should we do?
# Using /proc/modules, grep, awk and sed to detect the modules which are using mei
MEI_USED_BY=$(cat /proc/modules | grep -w mei | awk '{print $4}' | sed "s/,/ /g")
# Also, we have to consider this order
# We have to remove mei_hdcp before mei_me
echo $MEI_USED_BY
intel_ipts mei_me mei_hdcp
Hmm, I don't know if there's any better solution than hard-coding. Other than that your solution via /proc/modules
seems good, but as you've mentioned, we'd need to follow the dependencies and order them correctly, which would make it much more complex. Also I don't think this will work if the modules are already unloaded, so it won't work for re-loading them.
We could do something like modprobe me_hdcp || true
to explicitly ignore errors. There's also a --remove-dependencies
option for modprobe
which will automatically remove all dependencies, so modprobe --remove-dependencies mei
should remove mei
, mei_me
, mei_hdcp
and intel_ipts
. But then we need a way to restore them on resume.
I think the issue here could be that the ME communication has not finished, i.e. ipts_stop initiates ME communication but does not wait for it to finish. So we need to wait for communication to finish before calling intel_ipts_cleanup. Unloading the module does that, as it waits until ipts_mei_cl_event_thread exits.
Yes, it seems that it is partially true. Still not so stable on 4.19, though. Anyway, we should not use ipts_stop
and intel_ipts_cleanup
except for debugging purpose.
# Remove IPTS from both ME side and i915 side
sudo modprobe -r intel_ipts
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_cleanup"
# Load IPTS from both i915 side and ME side
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_init"
sudo modprobe intel_ipts
Removing also mei modules here, too will make reloading more stable. I will use this when I have to completely reload IPTS:
# Remove IPTS from both ME side and i915 side
sudo modprobe -r intel_ipts
sudo modprobe -r mei_hdcp
sudo modprobe -r mei_me
sudo modprobe -r mei
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_cleanup"
sudo su -c 'echo "ipts removed from both ME side and i915 side" > /dev/kmsg'
# Load IPTS from both i915 side and ME side
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_init"
sudo modprobe mei
sudo modprobe mei_me
sudo modprobe mei_hdcp
sudo modprobe intel_ipts
sudo su -c 'echo "ipts inserted from both i915 side and ME side" > /dev/kmsg'
modprobe --remove-dependencies mei should remove mei, mei_me, mei_hdcp and intel_ipts.
Unfortunately, it is not working on my side anyway. mei
modules remain loaded.
I will share if I find a better way.
Alright, then I'm out of ideas. Interesting that this doesn't work.
I have had problems with the 5.2.5 IPTS on resume from suspend if I don't unload the modules. I have Debian Buster with Cinnamon on SB1. It is not a consistent issue and doesn't happen every time, but when it does it works as follows:
Upon resume, the system is partially unresponsive. I see the lock screen, and I can move the cursor using the touchpad. However, touch and pressing keys on the keyboard doesn't work (can't type in password). After a time, mouse may freeze, too. Eventually something (I guess X) crashes, and I am sent to the login screen, logged out. When I log in again, the system is in software rendering mode. Restart fixes it back.
Here's the syslog from the point suspend starts: journal.txt
GPU Hang occurred at 10:59:16 but I feel mwifiex_pcie module is blocking suspend (and something messed up then caused the GPU Hang?).
mwifiex_pcie 0000:03:00.0: adapter is not valid
mwifiex_pcie 0000:03:00.0: adapter structure is not valid
First, check if low_power_idle_cpu_residency_us
(PC10 residency) increased between suspend to see if your device actually entered suspend.
Edit system-sleep/sleep script like this to add counters (will be printed to dmesg or journalctl):
#!/bin/sh
PATH_CPULPI_US=/sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
PATH_SLPS0_US=/sys/kernel/debug/pmc_core/slp_s0_residency_usec
case $1/$2 in
pre/*)
echo "pre-suspend state" > /dev/kmsg
echo "$(basename $PATH_CPULPI_US): $(cat $PATH_CPULPI_US) usec" > /dev/kmsg
echo "$(basename $PATH_SLPS0_US): $(cat $PATH_SLPS0_US) usec" > /dev/kmsg
;;
post/*)
echo "post-suspend state" > /dev/kmsg
echo "$(basename $PATH_CPULPI_US): $(cat $PATH_CPULPI_US) usec" > /dev/kmsg
echo "$(basename $PATH_SLPS0_US): $(cat $PATH_SLPS0_US) usec" > /dev/kmsg
;;
esac
I forgot to mention, but this issue doesn't happen if I unload ipts, mei_me and mei before suspend. On the other hand, it happens regardless of whether I unload mwifiex or not. So I'm pretty sure it's IPTS causing it.
Also notice that the GPU hand is 16 seconds after "10:59:00 surface kernel: PM: suspend exit".
I don't have any idea now. Can you run this script without GPU Hang?
# Remove IPTS from both ME side and i915 side
sudo modprobe -r intel_ipts
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_cleanup"
sleep 3
# Load IPTS from both i915 side and ME side
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_init"
sudo modprobe intel_ipts
That script works fine. I ran in 100 times with 1s sleep between each run. Then I ran it without any sleep between the runs, and my system froze. Not sure if it's GPU hand, as it never recovered, and there's nothing in syslog.
I'll try without the sleep 3 in between.
Edit: Works fine without the sleep 3 (still with sleep 1 between runs). So it only freezes if there's no sleep between script runs.
OK then, anyway, try adding that counters to system-sleep/sleep script and increase debug output.
Enable some debug output:
sudo su -c "echo 1 > /sys/power/pm_debug_messages"
sudo su -c "echo 1 > /sys/module/printk/parameters/ignore_loglevel"
sudo su -c "echo 1 > /sys/kernel/debug/clear_warn_once"
and please post a log again.
Ah... I tend to get GPU hang when I use rtc_wake to wakeup the device:
sync && sync && sync
sudo bash /usr/lib/systemd/system-sleep/* pre
sudo rtcwake -m freeze -s 10
sudo bash /usr/lib/systemd/system-sleep/* post
Regarding occasional system freeze after suspend (https://github.com/jakeday/linux-surface/issues/544#issuecomment-513531566).
This is not a GPU hang, but maybe related. After resume, when system freeze will occur, it will always happen right after ipts_send_sensor_clear_mem_window_cmd
get called.
EDIT: The function ipts_send_sensor_clear_mem_window_cmd
will be called from case TOUCH_SENSOR_GET_DEVICE_INFO_RSP
in ipts_handle_resp
EDIT2: On suspend/resume, ipts_mei_cl_remove
/ipts_mei_cl_probe
will be directly called. This may be a problem.
I haven't come up with a good idea yet. What I can do is to insert some sleep before ipts_send_sensor_clear_mem_window_cmd
in case TOUCH_SENSOR_GET_DEVICE_INFO_RSP
.
diff --git a/drivers/misc/ipts/ipts-msg-handler.c b/drivers/misc/ipts/ipts-msg-handler.c
index 87144778a..f114607df 100644
--- a/drivers/misc/ipts/ipts-msg-handler.c
+++ b/drivers/misc/ipts/ipts-msg-handler.c
@@ -1,4 +1,5 @@
#include <linux/mei_cl_bus.h>
+#include <linux/delay.h>
#include "ipts.h"
#include "ipts-hid.h"
@@ -264,6 +265,8 @@ int ipts_handle_resp(ipts_info_t *ipts, touch_sensor_msg_m2h_t *m2h_msg,
break;
}
+ pr_alert("DEBUG: sleeping for 1000 ms\n");
+ msleep (1000);
cmd_status = ipts_send_sensor_clear_mem_window_cmd(ipts);
break;
I think the best is rather to avoid ipts_mei_cl_remove
/ipts_mei_cl_probe
getting called somehow.
If I remove intel_ipts
right after loading the module, it still causes weird behavior as like @tmarkov reported.
sudo modprobe -r intel_ipts
sudo modprobe intel_ipts # no problem here
sudo modprobe -r intel_ipts # remove too early
No matter how long the sleep is (same result even with 10 sec)
kern :info : [ 44.789712] IPTS ipts_mei_cl_exit() is called # sudo modprobe -r intel_ipts
kern :info : [ 44.789728] DEBUG: ipts_mei_cl_remove called
kern :info : [ 44.789729] DEBUG: ipts_stop called
kern :info : [ 44.789730] DEBUG: ipts_send_sensor_quiesce_io_cmd called
kern :info : [ 44.801421] DEBUG: ipts_send_sensor_clear_mem_window_cmd called
kern :err : [ 44.934124] ipts mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F: error in reading m2h msg
kern :info : [ 44.934165] IPTS removed
kern :info : [ 44.976476] IPTS ipts_mei_cl_init() is called # sudo modprobe intel_ipts # no problem here
kern :info : [ 44.976492] probing Intel Precise Touch & Stylus
kern :info : [ 44.976493] IPTS using DMA_BIT_MASK(64)
kern :info : [ 44.976639] DEBUG: ipts_start called
kern :info : [ 44.982083] input: ipts 1B96:005E UNKNOWN as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:1B96:005E.0004/input/input50
kern :info : [ 44.982349] input: ipts 1B96:005E as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:1B96:005E.0004/input/input52
kern :info : [ 44.982633] input: ipts 1B96:005E Touchscreen as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:1B96:005E.0004/input/input53
kern :info : [ 44.982902] input: ipts 1B96:005E Mouse as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:1B96:005E.0004/input/input54
kern :info : [ 44.983166] input: ipts 1B96:005E UNKNOWN as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:1B96:005E.0004/input/input57
kern :info : [ 44.983337] hid-multitouch 0044:1B96:005E.0004: input,hidraw0: <UNKNOWN> HID v16900.00 Mouse [ipts 1B96:005E] on heci3
kern :alert : [ 44.983401] DEBUG: sleeping for 1000 ms
kern :info : [ 44.996003] IPTS ipts_mei_cl_exit() is called # sudo modprobe -r intel_ipts # remove too early
kern :info : [ 44.996020] DEBUG: ipts_mei_cl_remove called
kern :info : [ 44.996021] DEBUG: ipts_stop called
kern :info : [ 44.996022] DEBUG: ipts_send_sensor_quiesce_io_cmd called
kern :info : [ 44.996326] DEBUG: ipts_send_sensor_clear_mem_window_cmd called
kern :info : [ 45.993840] DEBUG: ipts_send_sensor_clear_mem_window_cmd called
kern :err : [ 45.993853] ipts mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F: mei_cldev_send() error 0x7:-19
kern :info : [ 45.993855] DEBUG: ipts_stop called
kern :info : [ 45.993856] DEBUG: ipts_send_sensor_quiesce_io_cmd called
kern :err : [ 45.993860] ipts mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F: mei_cldev_send() error 0x4:-19
kern :info : [ 45.993862] DEBUG: ipts_send_sensor_clear_mem_window_cmd called
kern :err : [ 45.993865] ipts mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F: mei_cldev_send() error 0x7:-19
kern :info : [ 45.993866] DEBUG: ipts_send_sensor_quiesce_io_cmd called
kern :err : [ 45.993869] ipts mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F: mei_cldev_send() error 0x4:-19
kern :info : [ 45.993909] IPTS removed
It seems that ipts_mei_cl_exit
will be called before ipts_send_sensor_clear_mem_window_cmd
getting called. So, it should be like this:
kern :alert : [ 44.983401] DEBUG: sleeping for 1000 ms
DEBUG: ipts_send_sensor_clear_mem_window_cmd called
ipts mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F: touch enabled 4
IPTS ipts_mei_cl_exit() is called
[...]
Sadly, even after inserting sleep (https://github.com/jakeday/linux-surface/issues/544#issuecomment-523868834), it still occasionally freezes right after the debug print (DEBUG: sleeping for 1000 ms
) on resuming from suspend.
No further log available.
Note:
no_console_suspend
will cause GPU hang.Touch no longer works for me after the latest patch (qzed's 5.2.14 release). After reloading the touch modules, I can do one single touch and then touch breaks. Here's some logs starting with reloading touch modules:
@tmarkov Surface Book 1? Seems like the workaround by @kitakar5525 disabling IPTS feedback for #374 makes some problems. I've had the same issue on the SB2 due to which we've decided to DMI-match and only apply the workaround on SB1 and SP4. It's weird though that it doesn't work for you but works for @kitakar5525.
Not so weird, there've been multiple issues where different SB1 units behave differently, and the touch dropout is one of them.
@tmarkov Interesting, I didn't think that the differences were that big, thanks! I'll change the workaround defaults later, for now you can set intel_ipts.no_feedback=0
as kernel option. This should deactivate the workaround.
@tmarkov Could this have anything to do with processor differences (e.g. different generation)? If so it would be better to match against this for the workaround.
I posted my cpuinfo here https://github.com/jakeday/linux-surface/issues/374#issuecomment-497770888, but I don't have any to compare with. It could also be GPU related, so from glxinfo: Device: Mesa DRI Intel(R) HD Graphics 520 (Skylake GT2) (0x1916)
I'm pretty sure it's all Skylake for SB1, but are there any variations within the skylake generation? Or if not, it could be i3 vs i5 vs i7.
Ah right, I remember. @kitakar5525 could you send me the output of cat /proc/cpuinfo
and the GPU line from glxinfo
?
From what can see in Wikipedia, the CPUs are all 6xxx
series with the same Intel HD Graphics 520
, specifically i5-6300U
and i7-6600U
. So maybe i5 vs i7? The SP4, which should also have this issue, has m3-6Y30
, i5-6300U
, and i7-6650U
. Would also be interesting to know if there's different IPTS firmware for the different models. It seems that in ACPI, there is at least the capability for it (AFAIK TSML
is the touch firmware provider):
Device (TSML)
{
Method (_HID, 0, NotSerialized) // _HID: Hardware ID
{
If ((OMBR < 0x04))
{
Return ("MSHW0075")
}
Else
{
Return ("MSHW0076")
}
}
}
@qzed I put intel_ipts.no_feedback=0
in /etc/sysctl.d/local.conf
but it doesn't seem to do anything. sudo sysctl -a | grep ipts
also prints nothing.
EDIT: I see, so it only works as a boot parameter.
@tmarkov
EDIT: I see, so it only works as a boot parameter.
Does it work for you when you add the parameter to your bootloader?
Or you can do the same thing after boot:
sudo su -c "echo 0 > /sys/module/intel_ipts/parameters/no_feedback"
could you send me the output of cat /proc/cpuinfo and the GPU line from glxinfo?
``` processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0xcc cpu MHz : 2042.955 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0xcc cpu MHz : 2087.971 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0xcc cpu MHz : 2019.788 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0xcc cpu MHz : 2032.509 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: ```
cpuinfo.txt for download
$ glxinfo | grep Device
Device: Mesa DRI Intel(R) HD Graphics 520 (Skylake GT2) (0x1916)
I can't see so much difference.
Would also be interesting to know if there's different IPTS firmware for the different models. It seems that in ACPI, there is at least the capability for it (AFAIK TSML is the touch firmware provider):
I cannot see firmware for MSHW0075
on my Windows installation:
``` total 8.7M -rwxrwxrwx 2 root root 85 Jul 15 2016 iaPreciseTouchDescriptor.bin -rwxrwxrwx 5 root root 2.0K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0076.bin -rwxrwxrwx 4 root root 2.0K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0078.bin -rwxrwxrwx 4 root root 2.7K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0079.bin -rwxrwxrwx 4 root root 2.0K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0101.bin -rwxrwxrwx 4 root root 2.0K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0102.bin -rwxrwxrwx 4 root root 2.0K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0103.bin -rwxrwxrwx 4 root root 2.0K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0137.bin -rwxrwxrwx 2 root root 1.1M Sep 22 2017 SurfaceTouchServicingKernelMSHW0079.bin -rwxrwxrwx 4 root root 516 Sep 22 2017 SurfaceTouchServicingKernelMSHW0079.bin.sig -rwxrwxrwx 2 root root 1.3M Sep 22 2017 SurfaceTouchServicingKernelMSHW0101.bin -rwxrwxrwx 4 root root 516 Sep 22 2017 SurfaceTouchServicingKernelMSHW0101.bin.sig -rwxrwxrwx 2 root root 1.3M Sep 22 2017 SurfaceTouchServicingKernelMSHW0102.bin -rwxrwxrwx 4 root root 516 Sep 22 2017 SurfaceTouchServicingKernelMSHW0102.bin.sig -rwxrwxrwx 2 root root 1.3M Sep 22 2017 SurfaceTouchServicingKernelMSHW0137.bin -rwxrwxrwx 4 root root 516 Sep 22 2017 SurfaceTouchServicingKernelMSHW0137.bin.sig -rwxrwxrwx 5 root root 1.3M Sep 22 2017 SurfaceTouchServicingKernelSKLMSHW0076.bin -rwxrwxrwx 4 root root 1.3M Sep 22 2017 SurfaceTouchServicingKernelSKLMSHW0078.bin -rwxrwxrwx 4 root root 1.3M Sep 22 2017 SurfaceTouchServicingKernelSKLMSHW0103.bin -rwxrwxrwx 5 root root 12K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0076.bin -rwxrwxrwx 4 root root 12K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0078.bin -rwxrwxrwx 4 root root 11K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0079.bin -rwxrwxrwx 4 root root 12K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0101.bin -rwxrwxrwx 4 root root 12K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0102.bin -rwxrwxrwx 4 root root 12K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0103.bin -rwxrwxrwx 4 root root 12K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0137.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0076.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0078.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0079.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0101.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0102.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0103.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0137.bin ```
I'm using firmware for MSHW0076
.
Might be MSHW0075
was for a pre-production/testing model or something. @tmarkov, @kitakar5525 Can you both nevertheless check which of those is present in /sys/bus/acpi/devices/
?
Apart from that the biggest difference seems to be i7-6600 vs i5-6300, but I'm not sure if that's the cause. They should be the same architecture, which means that it's likely they have the same GuC implementation and all (I'd assume the silicon differences are in the cores and not the peripherals). Also the microcode version seems to be different (0xcc
vs. 0xc6
), but that might be due to the processors being different, not sure if/by how much Intel re-uses their microcode.
I have MSHW0076:00.
Another thing that may be relevant is UEFI firmware. I have all my firmware up to date, although it is worth to note that (according to fwupdmgr) I have touch firmware version 105.0.24069 when the newest is 58.2.24087.
Let's continue discussing from gitter linux-surface/community here.
My porting jakeday patches to Linux 5.2 here and IPTS patch changes from 5.1 to 5.2 is also available there as README.md
to_intel_context()
is not available anymore. So, I usedintel_context_pin_lock()
instead. I actually don't knowintel_context_pin_lock()
is appropriate for replacement of that function.