linux-surface / surface-pro-x

Tracking and meta repository for Surface Pro X support.
76 stars 6 forks source link

Latest kernel causes issues with the ath10k driver (SQ1/8/512 - potential regression?) #42

Open Deinsti opened 1 year ago

Deinsti commented 1 year ago

Greetings!

I have been testing Arch Linux on my SQ1 Surface Pro X for a while now and generally experimented with a variety of areas such as GPU, CPU behaviour, etc. However recently, the latest kernel (seemingly) has a regression which causes the kernel to constantly panic after connecting to a Wi-Fi network (not instantly, this usually takes a random amount of time, ranging from a single minute up to 20 minutes)

SError Interrupt on CPU0, code 0x00000000be000011 ... pc : ath10k_snoc_napi_poll+0x84/0x150 [ath10k_snoc]

Suggests that the WiFi driver in the kernel is acting up, but I wonder why?

As far as I know, this issue was not present in the previous kernel versions... Any ideas as to the cause? I can add extra details if you wish, just ask!

[System Specs] - Surface Pro X CPU: SQ1 RAM: 8GB SSD: 512GB Kernel Ver: linux-surface 6.0.3-1 Boot method: USB with DTB

IMG_20221024_220232704

qzed commented 1 year ago

Thanks for reporting this. I had some spurious issues with wifi already in 6.0.1. In particular it freezed the desktop, but I assume they're the same problem as I was able to confirm your log by dropping to a tty in 6.0.3 (which also has these issues). Unfortunately I haven't had the time yet to dig into this more.

qzed commented 1 year ago

Okay, so unfortunately I can't get ./scripts/faddr2line to print the line where it fails:

$ ./scripts/faddr2line drivers/net/wireless/ath/ath10k/ath10k_snoc.ko ath10k_snoc_napi_poll+0x84
ath10k_snoc_napi_poll+0x84/0x150:
ath10k_snoc_napi_poll at snoc.c:?

Looks like we'll have to debug it the old-fashioned way.

qzed commented 1 year ago

I'm having some difficulties debugging this... It seems that since it's asynchronous we can't rely on the program counter or stack trace to point us to the specific place where it's failing. I do get some variation with the PC, but ath10k_snoc_napi_poll is always in the backtrace, so I'm assuming it's got something to do with that.

qzed commented 1 year ago

@Deinsti Can you still reproduce this on the latest kernel?

Deinsti commented 1 year ago

@qzed I currently do not have an Arch install ready to test, but either tomorrow or the weekend I'll re-install it and give it a go