jakeday / linux-surface

Linux Kernel for Surface Devices
2.59k stars 241 forks source link

Porting patches to Linux 5.2 (ipts) / Proper IPTS remove implementation #544

Closed kitakar5525 closed 5 years ago

kitakar5525 commented 5 years ago

Let's continue discussing from gitter linux-surface/community here.

My porting jakeday patches to Linux 5.2 here and IPTS patch changes from 5.1 to 5.2 is also available there as README.md

qzed commented 5 years ago

The problem is that we don't officialy have access to the mei bus device, which means I have to include "../mei/mei_dev.h in ipts-gfx.c.

We can actually use the parent device, that's a bit better (https://elixir.bootlin.com/linux/v5.2.2/source/drivers/misc/mei/bus.c#L912).

With that, suspend works with a3a3ed3 applied and https://github.com/qzed/linux-surface/commit/03b9074ef5affe5cef4783f0e57253f070e600d4 reverted. I still get the warning from https://github.com/jakeday/linux-surface/issues/544#issuecomment-514658312 though, so we'll need to find another way to fix that.

Edit: I have rebased/updated the commits in https://github.com/qzed/linux-surface/tree/v5.2, specifically, I have dropped https://github.com/qzed/linux-surface/commit/03b9074ef5affe5cef4783f0e57253f070e600d4 and fixed the device link to use the parent/bus device.

qzed commented 5 years ago

Okay, so it seems that I've missed a path before, and i915_drm_prepare also sets guc->send = intel_guc_send_nop. Given that the runtime-suspend function shouldn't get called during a normal suspend (but the prepare does) and the prepare shouldn't get called during runtime-suspend, we should probably be safe to move intel_ipts_suspend over there.

Let's hope this doesn't break the suspend order fixed by the device link.

qzed commented 5 years ago

Yeah, breaks it. I need to unload the module again...

At this point I propose we revert a3a3ed3 and deal with problems that arise from this directly. I've been running without that commit for a while now on my main kernel and I haven't experienced any issues, so I guess it's working fine on the SB2. You also mentioned you were running without it on the SB1, right? So Pros and Laptops are currently unknown.

I also suggest we keep the device link as that may fix any of those issues and I think it's a good idea to have the dependency ensured on the PM side.

kitakar5525 commented 5 years ago

At this point I propose we revert a3a3ed3 and deal with problems that arise from this directly.

Yes, I think reverting the commit is the simplest option. If we will revert it, I suggest including my debugfs patches (https://github.com/jakeday/linux-surface/issues/544#issuecomment-514665973) to manually call ipts_stop/ipts_start and intel_ipts_cleanup/intel_ipts_init so we can add it to systemd/system-sleep/sleep script.

Thank you for your work!

You also mentioned you were running without it on the SB1, right?

Yes, without the commit and without removing intel_ipts module and no particular issue for a long time at least on suspend. (I still occasionally encounter an entire system freeze maybe caused by ipts_stop or something, this is another issue.)

qzed commented 5 years ago

If we will revert it, I suggest including my debugfs patches (#544 (comment)) to manually call ipts_stop/ipts_start and intel_ipts_cleanup/intel_ipts_init so we can add it to systemd/system-sleep/sleep script.

Adding the debugfs patches seems like a good idea, this should allow prototyping workarounds as user-space scripts in case that becomes necessary. I wouldn't add it to the systemd/system-sleep/sleep script for everyone, only keep this as option/recommendation for the individual users in case there are any issues until it is fixed in kernel.

Thank you for your work!

Thank you for helping out and testing!

(I still occasionally encounter an entire system freeze maybe caused by ipts_stop or something, this is another issue.)

If you encounter this during suspend/resume you could try to unload/load the intel_ipts module (to try and reproduce the issue), as that's basically what's happening on the module side. Also: Have you had this happen with the fixed device link implementation? I kind of hoped that this would be something like the intel_ipts module accessing the already suspended i915 driver, which should be fixed with the device link.

qzed commented 5 years ago

I've added your patches to my 5.2 branch and I've also set ENABLE_IPTS_DEBUG.

Eventually, we should back-port the changes to 4.19, but I think we should focus on #374 before we do that, might save us some work. Also 5.3 seems like it's going to take a bit of work again.

kitakar5525 commented 5 years ago

Also: Have you had this happen with the fixed device link implementation? I kind of hoped that this would be something like the intel_ipts module accessing the already suspended i915 driver, which should be fixed with the device link.

tested with

I did a stress test for intel_ipts module removing and system suspend.

ipts-stress-test-remove-insert.sh

- intel_ipts module remove/insert 30 times ```bash #!/bin/bash loop=0 while true; do modprobe -r intel_ipts echo "ipts removed" > /dev/kmsg sleep 3 # [1] modprobe intel_ipts echo "ipts inserted" > /dev/kmsg echo "loop: $((loop+=1)) done" > /dev/kmsg if [ $loop -eq 30 ]; then break fi echo "interrupt here to stop this stress-test"; sleep 5 done ```

suspend-stress-test.sh

- suspend/resume 30 times ```bash #!/bin/bash PATH_CPULPI_US=/sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us PATH_SLPS0_US=/sys/kernel/debug/pmc_core/slp_s0_residency_usec loop=0 while true; do bash /usr/lib/systemd/system-sleep/* pre rtcwake -m freeze -s 15 bash /usr/lib/systemd/system-sleep/* post echo "$(basename $PATH_CPULPI_US): $(cat $PATH_CPULPI_US) usec" > /dev/kmsg echo "$(basename $PATH_SLPS0_US): $(cat $PATH_SLPS0_US) usec" > /dev/kmsg echo "loop: $((loop+=1)) done" > /dev/kmsg if [ $loop -eq 30 ]; then break fi echo "interrupt here to stop this stress-test"; sleep 5 done ```

I did two sets for each test and passed the tests without freezing the system. I noticed that when I comment out the first sleep 3 in ipts-stress-test-remove-insert.sh [1] will cause a system freeze with a similar journal output as my previous comment https://github.com/jakeday/linux-surface/issues/544#issuecomment-513531566.

I used to use this one-liner to reload intel_ipts module:

# bad example
sudo modprobe -r intel_ipts && sudo modprobe intel_ipts

and I think re-loading the module too fast may be the cause. I should instead insert some sleep:

# good example
sudo modprobe -r intel_ipts && sleep 3 && sudo modprobe intel_ipts

For suspend, I used to set resume_delay to 0 by sysctl kernel.resume_delay=0. On the other hand, the default value on jakeday kernel is 3000. https://github.com/jakeday/linux-surface/blob/3d0abed6c461fd269694b66b9bb6372be230fa20/patches/5.1/0002-suspend.patch#L101

I will see what happens when I set resume_delay to 3000 (although passed the test 2 times).

qzed commented 5 years ago

@kitakar5525 Awesome work!

I noticed that when I comment out the first sleep 3 in ipts-stress-test-remove-insert.sh [1] will cause a system freeze with a similar journal output as my previous comment #544 (comment).

That's interesting! I'll try to find the cause of that.

For suspend, I used to set resume_delay to 0 by sysctl kernel.resume_delay=0. On the other hand, the default value on jakeday kernel is 3000.

I've actually reverted that change at the moment. I don't think that the delay should cause any issues with IPTS, I'd rather expect it to make things a bit more stable. Nevertheless, testing both with and without it is probably a good idea.

kitakar5525 commented 5 years ago

ipts_stop && intel_ipts_cleanup on 4.19

I added destroy_doorbell() into i915_guc_ipts_submission_disable(). Now ipts_stop && intel_ipts_cleanup is working without causing GPU hang also on 4.19.

0001-i915-ipts-add-destroy_doorbell-when-disabling-ipts-g.patch

```diff From dcbb2d2c3a3d4f22f91bbbc006989338fd556ab9 Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Fri, 2 Aug 2019 08:00:10 +0900 Subject: [PATCH] i915: ipts: add destroy_doorbell() when disabling ipts guc submission Adding destroy_doorbell() into i915_guc_ipts_submission_disable() corresponds to __guc_client_disable() on 5.0+ --- drivers/gpu/drm/i915/intel_guc_submission.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c index 32a9d29c5..4cbc75fe3 100644 --- a/drivers/gpu/drm/i915/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/intel_guc_submission.c @@ -1408,6 +1408,7 @@ void i915_guc_ipts_submission_disable(struct drm_i915_private *dev_priv) if (!guc->ipts_client) return; + destroy_doorbell(guc->ipts_client); guc_client_free(guc->ipts_client); guc->ipts_client = NULL; } -- 2.22.0 ```

I noticed that calling intel_ipts_cleanup too early tend to cause the two WARN_ONs (WARN_ON(i915_vma_unbind(vma)) and WARN_ON(i915_gem_object_has_pinned_pages(obj))) I mentioned before (https://github.com/jakeday/linux-surface/issues/544#issuecomment-514985562). I need to insert some sleep here, too:

sudo su -c "echo 1 > /sys/kernel/debug/ipts/ipts_stop"
sleep 3
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_cleanup"

I will update my old comments. https://github.com/jakeday/linux-surface/issues/544#issuecomment-514665973 https://github.com/jakeday/linux-surface/issues/544#issuecomment-514985562 https://github.com/jakeday/linux-surface/issues/544#issuecomment-515511237


Porting to 5.3-rc

Also 5.3 seems like it's going to take a bit of work again.

Yes, I cannot make IPTS work on 5.3-rc yet. My porting here. I cannot find the appropriate replacement of intel_context_pin.

qzed commented 5 years ago

@kitakar5525 Sorry for taking so long to answer this. The change for 4.19 makes sense, nice work spotting that!. In 5.2 that's apparently handled by the __guc_client_disable function, so this explains why it's working there.

I'm not sure when I'm able to look at 5.3, but I'll try to before the official release.

qzed commented 5 years ago

@kitakar5525

I cannot make IPTS work on 5.3-rc yet. My porting here. I cannot find the appropriate replacement of intel_context_pin.

I think the replacements for create_ipts_context should be i915_gem_context_get_engine and intel_context_pin. For intel_context_lookup in destroy_ipts_context I think it should be i915_gem_context_lookup_engine. Also we need to add intel_context_put due to reference counting.

I've got another problem now: ida_simple_get in guc_client_alloc fails with -ENOSPC. Due to this i915_guc_ipts_submission_enable fails.

Here's the patch for v5.2 directly ported to v5.3 (warning: with compilation issues), and below are my current changes:

0002-IPTS-Fix-compile-issues.patch

```diff From 815eda14e709abae9aa6a83f5e184f920efd494c Mon Sep 17 00:00:00 2001 From: qzed Date: Mon, 5 Aug 2019 06:17:33 +0200 Subject: [PATCH 2/2] IPTS: Fix compile issues --- drivers/gpu/drm/i915/intel_ipts.c | 37 +++++++++++++++++++++---------- 1 file changed, 25 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_ipts.c b/drivers/gpu/drm/i915/intel_ipts.c index 3d3c353986f7..d9654ca53c36 100644 --- a/drivers/gpu/drm/i915/intel_ipts.c +++ b/drivers/gpu/drm/i915/intel_ipts.c @@ -29,6 +29,7 @@ #include "intel_guc_submission.h" #include "i915_drv.h" +#include "gem/i915_gem_context.h" #define SUPPORTED_IPTS_INTERFACE_VERSION 1 @@ -86,7 +87,7 @@ static intel_ipts_object_t *ipts_object_create(size_t size, u32 flags) } /* Allocate the new object */ - gem_obj = i915_gem_object_create(dev_priv, size); + gem_obj = i915_gem_object_create_shmem(dev_priv, size); if (gem_obj == NULL) { ret = -ENOMEM; goto err_out; @@ -136,8 +137,8 @@ static int ipts_object_pin(intel_ipts_object_t* obj, struct drm_i915_private *dev_priv = to_i915(intel_ipts.dev); int ret = 0; - if (ipts_ctx->ppgtt) { - vm = &ipts_ctx->ppgtt->vm; + if (ipts_ctx->vm) { + vm = ipts_ctx->vm; } else { vm = &dev_priv->ggtt.vm; } @@ -173,7 +174,6 @@ static void ipts_object_unmap(intel_ipts_object_t* obj) static int create_ipts_context(void) { struct i915_gem_context *ipts_ctx = NULL; - struct drm_i915_private *dev_priv = to_i915(intel_ipts.dev); struct intel_context *ce = NULL; int ret = 0; @@ -192,12 +192,19 @@ static int create_ipts_context(void) goto err_unlock; } - ce = intel_context_pin(ipts_ctx, dev_priv->engine[RCS0]); + ce = i915_gem_context_get_engine(ipts_ctx, RCS0); if (IS_ERR(ce)) { DRM_ERROR("Failed to create intel context (error %ld)\n", PTR_ERR(ce)); ret = PTR_ERR(ce); - goto err_unlock; + goto err_ctx; + } + + ret = intel_context_pin(ce); + if (ret) { + DRM_ERROR("Failed to pin intel context (error %d)\n", ret); + ret = PTR_ERR(ce); + goto err_ctx; } ret = execlists_context_deferred_alloc(ce, ce->engine); @@ -223,7 +230,9 @@ static int create_ipts_context(void) return 0; err_ctx: - if (ipts_ctx) + if (!IS_ERR_OR_NULL(ce)) + intel_context_put(ce); + if (!IS_ERR_OR_NULL(ipts_ctx)) i915_gem_context_put(ipts_ctx); err_unlock: @@ -235,23 +244,27 @@ static int create_ipts_context(void) static void destroy_ipts_context(void) { struct i915_gem_context *ipts_ctx = NULL; - struct drm_i915_private *dev_priv = to_i915(intel_ipts.dev); struct intel_context *ce = NULL; int ret = 0; ipts_ctx = intel_ipts.ipts_context; - ce = intel_context_lookup(ipts_ctx, dev_priv->engine[RCS0]); + ce = i915_gem_context_lookup_engine(ipts_ctx, RCS0); + if (IS_ERR(ce)) { + DRM_ERROR("i915_gem_context_lookup_engine failed: %ld\n", PTR_ERR(ce)); + return; + } /* Initialize the context right away.*/ ret = i915_mutex_lock_interruptible(intel_ipts.dev); if (ret) { - DRM_ERROR("i915_mutex_lock_interruptible failed \n"); + DRM_ERROR("i915_mutex_lock_interruptible failed\n"); return; } execlists_context_unpin(ce); intel_context_unpin(ce); + intel_context_put(ce); i915_gem_context_put(ipts_ctx); mutex_unlock(&intel_ipts.dev->struct_mutex); @@ -452,8 +465,8 @@ static int intel_ipts_map_buffer(u64 gfx_handle, intel_ipts_mapbuffer_t *mapbuf) obj->cpu_addr = ipts_object_map(obj); } - if (ipts_ctx->ppgtt) { - vm = &ipts_ctx->ppgtt->vm; + if (ipts_ctx->vm) { + vm = ipts_ctx->vm; } else { vm = &dev_priv->ggtt.vm; } -- 2.22.0 ```

qzed commented 5 years ago

@kitakar5525

I noticed that calling intel_ipts_cleanup too early tend to cause the two WARN_ONs (WARN_ON(i915_vma_unbind(vma)) and WARN_ON(i915_gem_object_has_pinned_pages(obj))) I mentioned before (#544 (comment)). I need to insert some sleep here, too:

I think the issue here could be that the ME communication has not finished, i.e. ipts_stop initiates ME communication but does not wait for it to finish. So we need to wait for communication to finish before calling intel_ipts_cleanup. Unloading the module does that, as it waits until ipts_mei_cl_event_thread exits.

kitakar5525 commented 5 years ago

Regarding i915_guc_ipts_submission_enable, I (and you also) noticed the function does not return a true reason of guc_client_alloc failure.

0001-i915-ipts-return-a-true-reason-of-guc_client_alloc-f.patch

This change corresponds to `guc_clients_create` ```diff From 6a4c9384e3e7040c2982d6219b2c666c98e50f9b Mon Sep 17 00:00:00 2001 From: kitakar5525 <34676735+kitakar5525@users.noreply.github.com> Date: Fri, 2 Aug 2019 05:02:14 +0900 Subject: [PATCH] i915: ipts: return a true reason of guc_client_alloc failure --- drivers/gpu/drm/i915/intel_guc_submission.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c index 072fea44c..0a9a23715 100644 --- a/drivers/gpu/drm/i915/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/intel_guc_submission.c @@ -1502,7 +1502,7 @@ int i915_guc_ipts_submission_enable(struct drm_i915_private *dev_priv, ctx); if (IS_ERR(client)) { DRM_ERROR("Failed to create normal GuC client!\n"); - return -ENOMEM; + return PTR_ERR(client); } guc->ipts_client = client; -- 2.22.0 ```

qzed commented 5 years ago

Yeah, not sure why they did that with -ENOMEM. We should probably change that.

qzed commented 5 years ago

For some reason intel_guc_submission_init is not getting called. Via this guc->stage_ids does not get initialized, so naturally ida_simple_get with that as argument fails.

Edit: Seems like IPTS will not work on 5.3 at all... (via https://github.com/torvalds/linux/blob/v5.3-rc3/drivers/gpu/drm/i915/intel_uc.c#L313-L314):

int intel_uc_init(struct drm_i915_private *i915)
{
    // ...

    /* XXX: GuC submission is unavailable for now */
    GEM_BUG_ON(USES_GUC_SUBMISSION(i915));

    // ...
}

So that's kind of disappointing...

qzed commented 5 years ago

I've also pushed a Linux repo where I'll be working on IPTS and Surface related stuff, if you're interested: https://github.com/qzed/linux-surface-kernel. My current plan is to have the ...-surface branches to directly reflect the linux-surface repo structure and other branches for development/customization.

kitakar5525 commented 5 years ago

I didn't even come to think that GuC submission will not be allowed on 5.3. No wonder IPTS is not working.

qzed commented 5 years ago

Thanks for the links, good to know the corresponding commits! I also didn't think that at first, just followed the calls for intel_guc_submission_init wondering why it wasn't getting called. Guess that makes sense if the FW changes are that significantly.

kitakar5525 commented 5 years ago

note: how to use intel_ipts_cleanup/intel_ipts_init on sleep script (not recommended now)

Try this only if you have suspend/hibernate issue

We can still call intel_ipts_cleanup/intel_ipts_init on suspend/hibernate using systemd/system-sleep/sleep if you want:

case $1/$2 in
  pre/*)
    # Remove IPTS from ME side
    modprobe -r intel_ipts
    modprobe -r mei_hdcp
    modprobe -r mei_me
    modprobe -r mei
    # Remove IPTS from i915 side
    for i in $(find /sys/kernel/debug/dri -name i915_intel_ipts_cleanup); do echo 1 > $i; done
    ;;
  post/*)
    # Load IPTS from i915 side
    for i in $(find /sys/kernel/debug/dri -name i915_intel_ipts_init); do echo 1 > $i; done
    # Load IPTS from ME side
    modprobe mei
    modprobe mei_me
    modprobe mei_hdcp
    modprobe intel_ipts
    ;;
esac
qzed commented 5 years ago

Right, although I'd vote that we drop IPTS/MEI from the sleep script for now. If there are any problems we can gradually add them back in until we fix the real problem.

kitakar5525 commented 5 years ago

Right, although I'd vote that we drop IPTS/MEI from the sleep script for now. If there are any problems we can gradually add them back in until we fix the real problem.

Yes, I think so, too. That is an investigation how to use intel_ipts_cleanup/intel_ipts_init.


I noticed mei_hdcp module exists and depends on mei on 5.2. However, current systemd/system-sleep/sleep script does not include the mei_hdcp module. Thus, removing mei will fail.

We have to properly remove all the dependency of mei module if you want to use intel_ipts_cleanup/intel_ipts_init.

qzed commented 5 years ago

Right, good points! Thanks! I'll update the systemd-sleep script in my 5.2 branch with your findings!

kitakar5525 commented 5 years ago

We may want to automatically detect modules which are using mei. However, I could not find an easier way. What should we do?

# Using /proc/modules, grep, awk and sed to detect the modules which are using mei
MEI_USED_BY=$(cat /proc/modules | grep -w mei | awk '{print $4}' | sed "s/,/ /g")

# Also, we have to consider this order
# We have to remove mei_hdcp before mei_me
echo $MEI_USED_BY
intel_ipts mei_me mei_hdcp
qzed commented 5 years ago

Hmm, I don't know if there's any better solution than hard-coding. Other than that your solution via /proc/modules seems good, but as you've mentioned, we'd need to follow the dependencies and order them correctly, which would make it much more complex. Also I don't think this will work if the modules are already unloaded, so it won't work for re-loading them.

We could do something like modprobe me_hdcp || true to explicitly ignore errors. There's also a --remove-dependencies option for modprobe which will automatically remove all dependencies, so modprobe --remove-dependencies mei should remove mei, mei_me, mei_hdcp and intel_ipts. But then we need a way to restore them on resume.

kitakar5525 commented 5 years ago

note: reload IPTS completely

I think the issue here could be that the ME communication has not finished, i.e. ipts_stop initiates ME communication but does not wait for it to finish. So we need to wait for communication to finish before calling intel_ipts_cleanup. Unloading the module does that, as it waits until ipts_mei_cl_event_thread exits.

Yes, it seems that it is partially true. Still not so stable on 4.19, though. Anyway, we should not use ipts_stop and intel_ipts_cleanup except for debugging purpose.

# Remove IPTS from both ME side and i915 side
sudo modprobe -r intel_ipts
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_cleanup"

# Load IPTS from both i915 side and ME side
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_init"
sudo modprobe intel_ipts

Removing also mei modules here, too will make reloading more stable. I will use this when I have to completely reload IPTS:

# Remove IPTS from both ME side and i915 side
sudo modprobe -r intel_ipts
sudo modprobe -r mei_hdcp
sudo modprobe -r mei_me
sudo modprobe -r mei
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_cleanup"
sudo su -c 'echo "ipts removed from both ME side and i915 side" > /dev/kmsg'

# Load IPTS from both i915 side and ME side
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_init"
sudo modprobe mei
sudo modprobe mei_me
sudo modprobe mei_hdcp
sudo modprobe intel_ipts
sudo su -c 'echo "ipts inserted from both i915 side and ME side" > /dev/kmsg'
kitakar5525 commented 5 years ago

modprobe --remove-dependencies mei should remove mei, mei_me, mei_hdcp and intel_ipts.

Unfortunately, it is not working on my side anyway. mei modules remain loaded.

I will share if I find a better way.

qzed commented 5 years ago

Alright, then I'm out of ideas. Interesting that this doesn't work.

tmarkov commented 5 years ago

I have had problems with the 5.2.5 IPTS on resume from suspend if I don't unload the modules. I have Debian Buster with Cinnamon on SB1. It is not a consistent issue and doesn't happen every time, but when it does it works as follows:

Upon resume, the system is partially unresponsive. I see the lock screen, and I can move the cursor using the touchpad. However, touch and pressing keys on the keyboard doesn't work (can't type in password). After a time, mouse may freeze, too. Eventually something (I guess X) crashes, and I am sent to the login screen, logged out. When I log in again, the system is in software rendering mode. Restart fixes it back.

Here's the syslog from the point suspend starts: journal.txt

kitakar5525 commented 5 years ago

GPU Hang occurred at 10:59:16 but I feel mwifiex_pcie module is blocking suspend (and something messed up then caused the GPU Hang?).

mwifiex_pcie 0000:03:00.0: adapter is not valid
mwifiex_pcie 0000:03:00.0: adapter structure is not valid

First, check if low_power_idle_cpu_residency_us (PC10 residency) increased between suspend to see if your device actually entered suspend. Edit system-sleep/sleep script like this to add counters (will be printed to dmesg or journalctl):

#!/bin/sh

PATH_CPULPI_US=/sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
PATH_SLPS0_US=/sys/kernel/debug/pmc_core/slp_s0_residency_usec

case $1/$2 in
  pre/*)
    echo "pre-suspend state" > /dev/kmsg
    echo "$(basename $PATH_CPULPI_US): $(cat $PATH_CPULPI_US) usec" > /dev/kmsg
    echo "$(basename $PATH_SLPS0_US): $(cat $PATH_SLPS0_US) usec" > /dev/kmsg
    ;;
  post/*)
    echo "post-suspend state" > /dev/kmsg
    echo "$(basename $PATH_CPULPI_US): $(cat $PATH_CPULPI_US) usec" > /dev/kmsg
    echo "$(basename $PATH_SLPS0_US): $(cat $PATH_SLPS0_US) usec" > /dev/kmsg
    ;;
esac
tmarkov commented 5 years ago

I forgot to mention, but this issue doesn't happen if I unload ipts, mei_me and mei before suspend. On the other hand, it happens regardless of whether I unload mwifiex or not. So I'm pretty sure it's IPTS causing it.

Also notice that the GPU hand is 16 seconds after "10:59:00 surface kernel: PM: suspend exit".

kitakar5525 commented 5 years ago

I don't have any idea now. Can you run this script without GPU Hang?

# Remove IPTS from both ME side and i915 side
sudo modprobe -r intel_ipts
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_cleanup"

sleep 3

# Load IPTS from both i915 side and ME side
sudo su -c "echo 1 > /sys/kernel/debug/dri/*/i915_intel_ipts_init"
sudo modprobe intel_ipts
tmarkov commented 5 years ago

That script works fine. I ran in 100 times with 1s sleep between each run. Then I ran it without any sleep between the runs, and my system froze. Not sure if it's GPU hand, as it never recovered, and there's nothing in syslog.

I'll try without the sleep 3 in between.

Edit: Works fine without the sleep 3 (still with sleep 1 between runs). So it only freezes if there's no sleep between script runs.

kitakar5525 commented 5 years ago

OK then, anyway, try adding that counters to system-sleep/sleep script and increase debug output.

Enable some debug output:

sudo su -c "echo 1 > /sys/power/pm_debug_messages"
sudo su -c "echo 1 > /sys/module/printk/parameters/ignore_loglevel"
sudo su -c "echo 1 > /sys/kernel/debug/clear_warn_once"

and please post a log again.

kitakar5525 commented 5 years ago

Ah... I tend to get GPU hang when I use rtc_wake to wakeup the device:

sync && sync && sync
sudo bash /usr/lib/systemd/system-sleep/* pre
sudo rtcwake -m freeze -s 10
sudo bash /usr/lib/systemd/system-sleep/* post

Regarding occasional system freeze after suspend (https://github.com/jakeday/linux-surface/issues/544#issuecomment-513531566). This is not a GPU hang, but maybe related. After resume, when system freeze will occur, it will always happen right after ipts_send_sensor_clear_mem_window_cmd get called.

EDIT: The function ipts_send_sensor_clear_mem_window_cmd will be called from case TOUCH_SENSOR_GET_DEVICE_INFO_RSP in ipts_handle_resp

EDIT2: On suspend/resume, ipts_mei_cl_remove/ipts_mei_cl_probe will be directly called. This may be a problem.

kitakar5525 commented 5 years ago

I haven't come up with a good idea yet. What I can do is to insert some sleep before ipts_send_sensor_clear_mem_window_cmd in case TOUCH_SENSOR_GET_DEVICE_INFO_RSP.

diff --git a/drivers/misc/ipts/ipts-msg-handler.c b/drivers/misc/ipts/ipts-msg-handler.c
index 87144778a..f114607df 100644
--- a/drivers/misc/ipts/ipts-msg-handler.c
+++ b/drivers/misc/ipts/ipts-msg-handler.c
@@ -1,4 +1,5 @@
 #include <linux/mei_cl_bus.h>
+#include <linux/delay.h>

 #include "ipts.h"
 #include "ipts-hid.h"
@@ -264,6 +265,8 @@ int ipts_handle_resp(ipts_info_t *ipts, touch_sensor_msg_m2h_t *m2h_msg,
                    break;
            }

+           pr_alert("DEBUG: sleeping for 1000 ms\n");
+           msleep (1000);
            cmd_status = ipts_send_sensor_clear_mem_window_cmd(ipts);

            break;

I think the best is rather to avoid ipts_mei_cl_remove/ipts_mei_cl_probe getting called somehow.

kitakar5525 commented 5 years ago

If I remove intel_ipts right after loading the module, it still causes weird behavior as like @tmarkov reported.

sudo modprobe -r intel_ipts
sudo modprobe intel_ipts # no problem here
sudo modprobe -r intel_ipts # remove too early

No matter how long the sleep is (same result even with 10 sec)

kern  :info  : [   44.789712] IPTS ipts_mei_cl_exit() is called # sudo modprobe -r intel_ipts
kern  :info  : [   44.789728] DEBUG: ipts_mei_cl_remove called
kern  :info  : [   44.789729] DEBUG: ipts_stop called
kern  :info  : [   44.789730] DEBUG: ipts_send_sensor_quiesce_io_cmd called
kern  :info  : [   44.801421] DEBUG: ipts_send_sensor_clear_mem_window_cmd called
kern  :err   : [   44.934124] ipts mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F: error in reading m2h msg
kern  :info  : [   44.934165] IPTS removed
kern  :info  : [   44.976476] IPTS ipts_mei_cl_init() is called # sudo modprobe intel_ipts # no problem here
kern  :info  : [   44.976492] probing Intel Precise Touch & Stylus
kern  :info  : [   44.976493] IPTS using DMA_BIT_MASK(64)
kern  :info  : [   44.976639] DEBUG: ipts_start called
kern  :info  : [   44.982083] input: ipts 1B96:005E UNKNOWN as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:1B96:005E.0004/input/input50
kern  :info  : [   44.982349] input: ipts 1B96:005E as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:1B96:005E.0004/input/input52
kern  :info  : [   44.982633] input: ipts 1B96:005E Touchscreen as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:1B96:005E.0004/input/input53
kern  :info  : [   44.982902] input: ipts 1B96:005E Mouse as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:1B96:005E.0004/input/input54
kern  :info  : [   44.983166] input: ipts 1B96:005E UNKNOWN as /devices/pci0000:00/0000:00:16.4/mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F/0044:1B96:005E.0004/input/input57
kern  :info  : [   44.983337] hid-multitouch 0044:1B96:005E.0004: input,hidraw0: <UNKNOWN> HID v16900.00 Mouse [ipts 1B96:005E] on heci3
kern  :alert : [   44.983401] DEBUG: sleeping for 1000 ms
kern  :info  : [   44.996003] IPTS ipts_mei_cl_exit() is called # sudo modprobe -r intel_ipts # remove too early
kern  :info  : [   44.996020] DEBUG: ipts_mei_cl_remove called
kern  :info  : [   44.996021] DEBUG: ipts_stop called
kern  :info  : [   44.996022] DEBUG: ipts_send_sensor_quiesce_io_cmd called
kern  :info  : [   44.996326] DEBUG: ipts_send_sensor_clear_mem_window_cmd called
kern  :info  : [   45.993840] DEBUG: ipts_send_sensor_clear_mem_window_cmd called
kern  :err   : [   45.993853] ipts mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F: mei_cldev_send() error 0x7:-19
kern  :info  : [   45.993855] DEBUG: ipts_stop called
kern  :info  : [   45.993856] DEBUG: ipts_send_sensor_quiesce_io_cmd called
kern  :err   : [   45.993860] ipts mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F: mei_cldev_send() error 0x4:-19
kern  :info  : [   45.993862] DEBUG: ipts_send_sensor_clear_mem_window_cmd called
kern  :err   : [   45.993865] ipts mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F: mei_cldev_send() error 0x7:-19
kern  :info  : [   45.993866] DEBUG: ipts_send_sensor_quiesce_io_cmd called
kern  :err   : [   45.993869] ipts mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F: mei_cldev_send() error 0x4:-19
kern  :info  : [   45.993909] IPTS removed

It seems that ipts_mei_cl_exit will be called before ipts_send_sensor_clear_mem_window_cmd getting called. So, it should be like this:

kern  :alert : [   44.983401] DEBUG: sleeping for 1000 ms
DEBUG: ipts_send_sensor_clear_mem_window_cmd called
ipts mei::3e8d0870-271a-4208-8eb5-9acb9402ae04:0F: touch enabled 4
IPTS ipts_mei_cl_exit() is called
[...]
kitakar5525 commented 5 years ago

Sadly, even after inserting sleep (https://github.com/jakeday/linux-surface/issues/544#issuecomment-523868834), it still occasionally freezes right after the debug print (DEBUG: sleeping for 1000 ms) on resuming from suspend.

No further log available.

kitakar5525 commented 5 years ago

Note:

tmarkov commented 5 years ago

Touch no longer works for me after the latest patch (qzed's 5.2.14 release). After reloading the touch modules, I can do one single touch and then touch breaks. Here's some logs starting with reloading touch modules:

journal.txt

qzed commented 5 years ago

@tmarkov Surface Book 1? Seems like the workaround by @kitakar5525 disabling IPTS feedback for #374 makes some problems. I've had the same issue on the SB2 due to which we've decided to DMI-match and only apply the workaround on SB1 and SP4. It's weird though that it doesn't work for you but works for @kitakar5525.

tmarkov commented 5 years ago

Not so weird, there've been multiple issues where different SB1 units behave differently, and the touch dropout is one of them.

qzed commented 5 years ago

@tmarkov Interesting, I didn't think that the differences were that big, thanks! I'll change the workaround defaults later, for now you can set intel_ipts.no_feedback=0 as kernel option. This should deactivate the workaround.

qzed commented 5 years ago

@tmarkov Could this have anything to do with processor differences (e.g. different generation)? If so it would be better to match against this for the workaround.

tmarkov commented 5 years ago

I posted my cpuinfo here https://github.com/jakeday/linux-surface/issues/374#issuecomment-497770888, but I don't have any to compare with. It could also be GPU related, so from glxinfo: Device: Mesa DRI Intel(R) HD Graphics 520 (Skylake GT2) (0x1916)

I'm pretty sure it's all Skylake for SB1, but are there any variations within the skylake generation? Or if not, it could be i3 vs i5 vs i7.

qzed commented 5 years ago

Ah right, I remember. @kitakar5525 could you send me the output of cat /proc/cpuinfo and the GPU line from glxinfo?

From what can see in Wikipedia, the CPUs are all 6xxx series with the same Intel HD Graphics 520, specifically i5-6300U and i7-6600U. So maybe i5 vs i7? The SP4, which should also have this issue, has m3-6Y30, i5-6300U, and i7-6650U. Would also be interesting to know if there's different IPTS firmware for the different models. It seems that in ACPI, there is at least the capability for it (AFAIK TSML is the touch firmware provider):

Device (TSML)
{
    Method (_HID, 0, NotSerialized)  // _HID: Hardware ID
    {
        If ((OMBR < 0x04))
        {
            Return ("MSHW0075")
        }
        Else
        {
            Return ("MSHW0076")
        }
    }
}
tmarkov commented 5 years ago

@qzed I put intel_ipts.no_feedback=0 in /etc/sysctl.d/local.conf but it doesn't seem to do anything. sudo sysctl -a | grep ipts also prints nothing.

EDIT: I see, so it only works as a boot parameter.

kitakar5525 commented 5 years ago

@tmarkov

EDIT: I see, so it only works as a boot parameter.

Does it work for you when you add the parameter to your bootloader?

Or you can do the same thing after boot:

sudo su -c "echo 0 > /sys/module/intel_ipts/parameters/no_feedback"
kitakar5525 commented 5 years ago

could you send me the output of cat /proc/cpuinfo and the GPU line from glxinfo?

cat /proc/cpuinfo

``` processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0xcc cpu MHz : 2042.955 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0xcc cpu MHz : 2087.971 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 2 initial apicid : 2 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0xcc cpu MHz : 2019.788 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 78 model name : Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz stepping : 3 microcode : 0xcc cpu MHz : 2032.509 cache size : 4096 KB physical id : 0 siblings : 4 core id : 1 cpu cores : 2 apicid : 3 initial apicid : 3 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs bogomips : 5618.00 clflush size : 64 cache_alignment : 64 address sizes : 39 bits physical, 48 bits virtual power management: ```

cpuinfo.txt for download

$ glxinfo | grep Device
    Device: Mesa DRI Intel(R) HD Graphics 520 (Skylake GT2)  (0x1916)

I can't see so much difference.

Would also be interesting to know if there's different IPTS firmware for the different models. It seems that in ACPI, there is at least the capability for it (AFAIK TSML is the touch firmware provider):

I cannot see firmware for MSHW0075 on my Windows installation:

ls -lh Windows/INF/PreciseTouch/Intel

``` total 8.7M -rwxrwxrwx 2 root root 85 Jul 15 2016 iaPreciseTouchDescriptor.bin -rwxrwxrwx 5 root root 2.0K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0076.bin -rwxrwxrwx 4 root root 2.0K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0078.bin -rwxrwxrwx 4 root root 2.7K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0079.bin -rwxrwxrwx 4 root root 2.0K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0101.bin -rwxrwxrwx 4 root root 2.0K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0102.bin -rwxrwxrwx 4 root root 2.0K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0103.bin -rwxrwxrwx 4 root root 2.0K Sep 22 2017 SurfaceTouchServicingDescriptorMSHW0137.bin -rwxrwxrwx 2 root root 1.1M Sep 22 2017 SurfaceTouchServicingKernelMSHW0079.bin -rwxrwxrwx 4 root root 516 Sep 22 2017 SurfaceTouchServicingKernelMSHW0079.bin.sig -rwxrwxrwx 2 root root 1.3M Sep 22 2017 SurfaceTouchServicingKernelMSHW0101.bin -rwxrwxrwx 4 root root 516 Sep 22 2017 SurfaceTouchServicingKernelMSHW0101.bin.sig -rwxrwxrwx 2 root root 1.3M Sep 22 2017 SurfaceTouchServicingKernelMSHW0102.bin -rwxrwxrwx 4 root root 516 Sep 22 2017 SurfaceTouchServicingKernelMSHW0102.bin.sig -rwxrwxrwx 2 root root 1.3M Sep 22 2017 SurfaceTouchServicingKernelMSHW0137.bin -rwxrwxrwx 4 root root 516 Sep 22 2017 SurfaceTouchServicingKernelMSHW0137.bin.sig -rwxrwxrwx 5 root root 1.3M Sep 22 2017 SurfaceTouchServicingKernelSKLMSHW0076.bin -rwxrwxrwx 4 root root 1.3M Sep 22 2017 SurfaceTouchServicingKernelSKLMSHW0078.bin -rwxrwxrwx 4 root root 1.3M Sep 22 2017 SurfaceTouchServicingKernelSKLMSHW0103.bin -rwxrwxrwx 5 root root 12K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0076.bin -rwxrwxrwx 4 root root 12K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0078.bin -rwxrwxrwx 4 root root 11K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0079.bin -rwxrwxrwx 4 root root 12K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0101.bin -rwxrwxrwx 4 root root 12K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0102.bin -rwxrwxrwx 4 root root 12K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0103.bin -rwxrwxrwx 4 root root 12K Sep 22 2017 SurfaceTouchServicingSFTConfigMSHW0137.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0076.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0078.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0079.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0101.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0102.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0103.bin -rwxrwxrwx 4 root root 256 Sep 22 2017 SurfaceTouchServicingTouchBlobMSHW0137.bin ```

I'm using firmware for MSHW0076.

qzed commented 5 years ago

Might be MSHW0075 was for a pre-production/testing model or something. @tmarkov, @kitakar5525 Can you both nevertheless check which of those is present in /sys/bus/acpi/devices/?

Apart from that the biggest difference seems to be i7-6600 vs i5-6300, but I'm not sure if that's the cause. They should be the same architecture, which means that it's likely they have the same GuC implementation and all (I'd assume the silicon differences are in the cores and not the peripherals). Also the microcode version seems to be different (0xcc vs. 0xc6), but that might be due to the processors being different, not sure if/by how much Intel re-uses their microcode.

tmarkov commented 5 years ago

I have MSHW0076:00.

Another thing that may be relevant is UEFI firmware. I have all my firmware up to date, although it is worth to note that (according to fwupdmgr) I have touch firmware version 105.0.24069 when the newest is 58.2.24087.