samveen / mt7902-dkms

Cloning the MT9721 driver into the MT702 driver in the hopes of getting something running.
GNU General Public License v3.0
56 stars 6 forks source link

Trying out the `mt7921` as the base driver #8

Closed DarkMatter-999 closed 5 months ago

DarkMatter-999 commented 6 months ago

Copied all the mt792x structs and functions into mt7902_mt792x for all header files in ./mt76/*.h but these new definitions arent used anywhere for now.

Build is successful

DarkMatter-999 commented 6 months ago

Running sudo dmesg | grep mt7902 results in:

[    6.619950] mt7902 0000:01:00.0: enabling device (0000 -> 0002)
[    6.626061] mt7902 0000:01:00.0: ASIC revision: 79020000
[    9.730413] mt7902 0000:01:00.0: Message 00000010 (seq 1) timeout
[    9.730427] mt7902 0000:01:00.0: Failed to get patch semaphore
[   12.930343] mt7902 0000:01:00.0: Message 00000010 (seq 2) timeout
[   12.930354] mt7902 0000:01:00.0: Failed to get patch semaphore
[   16.130308] mt7902 0000:01:00.0: Message 00000010 (seq 3) timeout
[   16.130318] mt7902 0000:01:00.0: Failed to get patch semaphore
[   19.330605] mt7902 0000:01:00.0: Message 00000010 (seq 4) timeout
[   19.330621] mt7902 0000:01:00.0: Failed to get patch semaphore
[   22.530576] mt7902 0000:01:00.0: Message 00000010 (seq 5) timeout
[   22.530592] mt7902 0000:01:00.0: Failed to get patch semaphore
[   25.730583] mt7902 0000:01:00.0: Message 00000010 (seq 6) timeout
[   25.730599] mt7902 0000:01:00.0: Failed to get patch semaphore

and lsmod | grep mt7902

mt7902                 24576  0
mt7902_common          86016  1 mt7902
mt792x_lib             73728  2 mt7902,mt7902_common
mt76_connac_lib       102400  3 mt792x_lib,mt7902,mt7902_common
mt76                  135168  4 mt792x_lib,mt7902,mt7902_common,mt76_connac_lib
mac80211             1576960  5 mt792x_lib,mt76,mt7601u,mt7902_common,mt76_connac_lib
cfg80211             1351680  5 mt76,mt7601u,mt7902_common,mac80211,mt76_connac_lib
samveen commented 5 months ago

@DarkMatter-999 , The names need to replace the existing struct names, instead of being added. I'll see if I can give you an example commit in another branch.

DarkMatter-999 commented 5 months ago

Building now results in:

make -C /lib/modules/6.8.2-arch2-1/build M=/usr/src/mt7902-dkms/mt76/mt7902 modules
make[1]: Entering directory '/usr/lib/modules/6.8.2-arch2-1/build'
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/debugfs.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/pci.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/pci_mac.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/pci_mcu.o
  LD [M]  /usr/src/mt7902-dkms/mt76/mt7902/mt7902.o
  LD [M]  /usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.o
  MODPOST /usr/src/mt7902-dkms/mt76/mt7902/Module.symvers
ERROR: modpost: "mt7902_mt792x_tx_stats_show" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "mt7902_mt792x_set_wakeup" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "mt7902_mt792x_pm_wake_work" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "mt7902_mt792x_tx" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "mt7902_mt792x_init_acpi_sar" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "mt7902_mt792x_mac_init_band" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "mt7902_mt792x_get_et_strings" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "mt7902_mt792x_init_wiphy" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "mt7902_mt792x_init_wcid" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "mt7902_mt792x_conf_tx" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
WARNING: modpost: suppressed 49 unresolved symbol warnings because there were too many)
make[3]: *** [scripts/Makefile.modpost:145: /usr/src/mt7902-dkms/mt76/mt7902/Module.symvers] Error 1
make[2]: *** [/usr/lib/modules/6.8.2-arch2-1/build/Makefile:1873: modpost] Error 2
make[1]: *** [Makefile:240: __sub-make] Error 2
make[1]: Leaving directory '/usr/lib/modules/6.8.2-arch2-1/build'
make: *** [Makefile:19: modules] Error 2

Looking for direction on how to proceed now.

I think there should be a need to create a new target for mt7902

samveen commented 5 months ago
DarkMatter-999 commented 5 months ago

I think the mt792x_core.c is not enough by itself there's an interdependency between all the *.c files of the mt76 folder

I started by downloading the mt792x_core.c file and replacing function names, compile was not successful because of some undefined function, then its just downloading and patching mt792x_debugfs.c, then mt792x_mac.c and so on.

I dont think that is what is supposed to happen, please correct me if I'm doing something wrong here.

samveen commented 5 months ago

I think the mt792x_core.c is not enough by itself there's an interdependency between all the *.c files of the mt76 folder

I started by downloading the mt792x_core.c file and replacing function names, compile was not successful because of some undefined function, then its just downloading and patching mt792x_debugfs.c, then mt792x_mac.c and so on.

This is correct. The aim here to move remove any dependency on functions inside the mt76 module, and instead have a local copy under a different name. While this duplicates the functions, as everything is now local, issues are easier to debug.

DarkMatter-999 commented 5 months ago

Okay, duplicated and patched all the necessary files from the mt76 modules from the linux kernel repo. The code now successfully compiles.

samveen commented 5 months ago

Great!! :tada:

When you load the module, what messages appear in the kernel log?

While reviewing, I see that the same treatment needs to be given to the base mt76_ structures and functions to alter them into mt7902_mt76_ names, otherwise mt7902 is still not completely independent of mt76.

DarkMatter-999 commented 5 months ago

Rebuilding the whole mt76 might not be the best idea IMHO.

also building right now the results in:

 ~/p/m/m/mt7902 (rename)> make clean; make -j12
make -C /lib/modules/6.8.4-arch1-1/build M=/usr/src/mt7902-dkms/mt76/mt7902 clean
make[1]: Entering directory '/usr/lib/modules/6.8.4-arch1-1/build'
make[1]: Leaving directory '/usr/lib/modules/6.8.4-arch1-1/build'
make -C /lib/modules/6.8.4-arch1-1/build M=/usr/src/mt7902-dkms/mt76/mt7902 modules
make[1]: Entering directory '/usr/lib/modules/6.8.4-arch1-1/build'
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/mac.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/mcu.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/main.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/init.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/debugfs.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../mt792x_debugfs.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../mt792x_core.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../mt792x_mac.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../mt792x_acpi_sar.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../mt792x_dma.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../mac80211.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../mt76_connac_mcu.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../tx.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../agg-rx.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../util.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../eeprom.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../debugfs.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../dma.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../mt76_connac_mac.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../mcu.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../pci.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/../mmio.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/pci.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/pci_mac.o
  CC [M]  /usr/src/mt7902-dkms/mt76/mt7902/pci_mcu.o
  LD [M]  /usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.o
  LD [M]  /usr/src/mt7902-dkms/mt76/mt7902/mt7902.o
  MODPOST /usr/src/mt7902-dkms/mt76/mt7902/Module.symvers
ERROR: modpost: "__SCT__tp_func_reg_wr" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "__tracepoint_reg_wr" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "__SCK__tp_func_reg_rr" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "__SCK__tp_func_reg_wr" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "__SCT__tp_func_reg_rr" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
ERROR: modpost: "__tracepoint_reg_rr" [/usr/src/mt7902-dkms/mt76/mt7902/mt7902-common.ko] undefined!
make[3]: *** [scripts/Makefile.modpost:145: /usr/src/mt7902-dkms/mt76/mt7902/Module.symvers] Error 1
make[2]: *** [/usr/lib/modules/6.8.4-arch1-1/build/Makefile:1873: modpost] Error 2
make[1]: *** [Makefile:240: __sub-make] Error 2
make[1]: Leaving directory '/usr/lib/modules/6.8.4-arch1-1/build'
make: *** [Makefile:19: modules] Error 2

__SCT__ is usually an symbol created by GCC for macro expansion

PS - I think I have an insight into the problem, looks like linux version >= 6.8 introduced some changes to the mt76 stack so that might be the problem with the semaphore thing timing out. So, it will be better if we start from scratch on the latest mt76 module.

samveen commented 5 months ago

The tracing framework is the issue here. Please apply the following patch, and the build completes successfully:

diff --git a/mt76/mmio.c b/mt76/mmio.c
index 0d36640..0434eea 100644
--- a/mt76/mmio.c
+++ b/mt76/mmio.c
@@ -12,14 +12,14 @@ static u32 mt7902_mt76_mmio_rr(struct mt7902_mt76_dev *dev, u32 offset)
        u32 val;

        val = readl(dev->mmio.regs + offset);
-       trace_reg_rr(dev, offset, val);
+       //trace_reg_rr(dev, offset, val);

        return val;
 }

 static void mt7902_mt76_mmio_wr(struct mt7902_mt76_dev *dev, u32 offset, u32 val)
 {
-       trace_reg_wr(dev, offset, val);
+       //trace_reg_wr(dev, offset, val);
        writel(val, dev->mmio.regs + offset);
 }

I'll need a bit of time to figure out what we're missing here to enable the creation of the required macros.

DarkMatter-999 commented 5 months ago

Applied the patch now it compiles.

Running sudo dmesg | grep mt7902 now results in:

[    6.154023] mt7902 0000:01:00.0: enabling device (0000 -> 0002)
[    6.162305] mt7902 0000:01:00.0: ASIC revision: 79020000
[    9.300312] mt7902 0000:01:00.0: Message 00000010 (seq 1) timeout
[    9.300325] mt7902 0000:01:00.0: Failed to get patch semaphore
[   12.503636] mt7902 0000:01:00.0: Message 00000010 (seq 2) timeout
[   12.503643] mt7902 0000:01:00.0: Failed to get patch semaphore
[   15.700723] mt7902 0000:01:00.0: Message 00000010 (seq 3) timeout
[   15.700733] mt7902 0000:01:00.0: Failed to get patch semaphore
[   18.900307] mt7902 0000:01:00.0: Message 00000010 (seq 4) timeout
[   18.900318] mt7902 0000:01:00.0: Failed to get patch semaphore
[   22.100306] mt7902 0000:01:00.0: Message 00000010 (seq 5) timeout
[   22.100314] mt7902 0000:01:00.0: Failed to get patch semaphore
[   25.300294] mt7902 0000:01:00.0: Message 00000010 (seq 6) timeout
[   25.300300] mt7902 0000:01:00.0: Failed to get patch semaphore
[   28.503619] mt7902 0000:01:00.0: Message 00000010 (seq 7) timeout
[   28.503630] mt7902 0000:01:00.0: Failed to get patch semaphore
[   31.700289] mt7902 0000:01:00.0: Message 00000010 (seq 8) timeout
[   31.700298] mt7902 0000:01:00.0: Failed to get patch semaphore
[   34.903631] mt7902 0000:01:00.0: Message 00000010 (seq 9) timeout
[   34.903641] mt7902 0000:01:00.0: Failed to get patch semaphore
[   38.100281] mt7902 0000:01:00.0: Message 00000010 (seq 10) timeout
[   38.100298] mt7902 0000:01:00.0: Failed to get patch semaphore
[   38.181120] mt7902 0000:01:00.0: hardware init failed
[  969.509199] mt7902 0000:01:00.0: Message 00020007 (seq 11) timeout
[  969.509211] mt7902 0000:01:00.0: PM: pci_pm_suspend(): mt7902_pci_suspend+0x0/0x240 [mt7902] returns -110
[  969.509228] mt7902 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -110
[  969.509243] mt7902 0000:01:00.0: PM: failed to suspend async: error -110
[  973.349137] mt7902 0000:01:00.0: Message 00020007 (seq 12) timeout
[  973.349150] mt7902 0000:01:00.0: PM: pci_pm_suspend(): mt7902_pci_suspend+0x0/0x240 [mt7902] returns -110
[  973.349167] mt7902 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -110
[  973.349182] mt7902 0000:01:00.0: PM: failed to suspend async: error -110
[ 1877.864087] mt7902 0000:01:00.0: Message 00020007 (seq 13) timeout
[ 1877.864102] mt7902 0000:01:00.0: PM: pci_pm_suspend(): mt7902_pci_suspend+0x0/0x240 [mt7902] returns -110
[ 1877.864121] mt7902 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -110
[ 1877.864138] mt7902 0000:01:00.0: PM: failed to suspend async: error -110
[ 1881.703906] mt7902 0000:01:00.0: Message 00020007 (seq 14) timeout
[ 1881.703918] mt7902 0000:01:00.0: PM: pci_pm_suspend(): mt7902_pci_suspend+0x0/0x240 [mt7902] returns -110
[ 1881.703935] mt7902 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -110
[ 1881.703950] mt7902 0000:01:00.0: PM: failed to suspend async: error -110
[ 2786.017232] mt7902 0000:01:00.0: Message 00020007 (seq 15) timeout
[ 2786.017244] mt7902 0000:01:00.0: PM: pci_pm_suspend(): mt7902_pci_suspend+0x0/0x240 [mt7902] returns -110
[ 2786.017261] mt7902 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -110
[ 2786.017277] mt7902 0000:01:00.0: PM: failed to suspend async: error -110
[ 2789.857191] mt7902 0000:01:00.0: Message 00020007 (seq 1) timeout
[ 2789.857203] mt7902 0000:01:00.0: PM: pci_pm_suspend(): mt7902_pci_suspend+0x0/0x240 [mt7902] returns -110
[ 2789.857220] mt7902 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -110
[ 2789.857237] mt7902 0000:01:00.0: PM: failed to suspend async: error -110
[ 3274.761453] mt7902 0000:01:00.0: Message 00020007 (seq 2) timeout
[ 3274.761465] mt7902 0000:01:00.0: PM: pci_pm_suspend(): mt7902_pci_suspend+0x0/0x240 [mt7902] returns -110
[ 3274.761482] mt7902 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -110
[ 3274.761497] mt7902 0000:01:00.0: PM: failed to suspend async: error -110
[ 3278.601407] mt7902 0000:01:00.0: Message 00020007 (seq 3) timeout
[ 3278.601420] mt7902 0000:01:00.0: PM: pci_pm_suspend(): mt7902_pci_suspend+0x0/0x240 [mt7902] returns -110
[ 3278.601438] mt7902 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -110
[ 3278.601453] mt7902 0000:01:00.0: PM: failed to suspend async: error -110
[ 3515.187218] mt7902 0000:01:00.0: Message 00020007 (seq 4) timeout
[ 3515.187234] mt7902 0000:01:00.0: PM: pci_pm_suspend(): mt7902_pci_suspend+0x0/0x240 [mt7902] returns -110
[ 3515.187255] mt7902 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -110
[ 3515.187272] mt7902 0000:01:00.0: PM: failed to suspend async: error -110
[ 3519.240585] mt7902 0000:01:00.0: Message 00020007 (seq 5) timeout
[ 3519.240601] mt7902 0000:01:00.0: PM: pci_pm_suspend(): mt7902_pci_suspend+0x0/0x240 [mt7902] returns -110
[ 3519.240622] mt7902 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -110
[ 3519.240641] mt7902 0000:01:00.0: PM: failed to suspend async: error -110
samveen commented 5 months ago

The function that fails is mt7902_mt76_connac2_load_patch based on a semaphore received from a call to mt7902_mt76_connac_mcu_patch_sem_ctrl, in turn calling other functions

If you add tracing messages to the this line of functions , we can see the call stack, and then try and zero in on the failure.

DarkMatter-999 commented 5 months ago

calling dump_stack() after calling mt7902_mt76_connac_mcu_patch_sem_ctrl

[    5.933731] mt7902 0000:01:00.0: enabling device (0000 -> 0002)
[    5.940288] mt7902 0000:01:00.0: ASIC revision: 79020000
[    6.020796] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[    6.020801] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1
[    9.090294] mt7902 0000:01:00.0: Message 00000010 (seq 1) timeout
[    9.090315] Workqueue: events mt7902_init_work [mt7902_common]
[    9.090368]  mt7902_mt76_connac2_load_patch+0xab/0x390 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[    9.090386]  mt7902_mt792x_load_firmware+0x42/0x160 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[    9.090400]  mt7902_run_firmware+0x2f/0x500 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[    9.090423]  ? ____mt7902_mt76_poll_msec+0x75/0xb0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[    9.090438]  mt7902e_mcu_init+0x4e/0x80 [mt7902 f53c80c43370a8bddecd2a69c8f618341c863fda]
[    9.090443]  mt7902_init_work+0x51/0x1d0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[    9.090493] mt7902 0000:01:00.0: Failed to get patch semaphore
[    9.168298] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[    9.168304] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1
[   12.286958] mt7902 0000:01:00.0: Message 00000010 (seq 2) timeout
[   12.286978] Workqueue: events mt7902_init_work [mt7902_common]
[   12.287034]  mt7902_mt76_connac2_load_patch+0xab/0x390 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   12.287056]  mt7902_mt792x_load_firmware+0x42/0x160 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   12.287070]  mt7902_run_firmware+0x2f/0x500 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   12.287091]  ? ____mt7902_mt76_poll_msec+0x75/0xb0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   12.287107]  mt7902e_mcu_init+0x4e/0x80 [mt7902 f53c80c43370a8bddecd2a69c8f618341c863fda]
[   12.287114]  mt7902_init_work+0x51/0x1d0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   12.287163] mt7902 0000:01:00.0: Failed to get patch semaphore
[   12.367931] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[   12.367954] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1
[   15.486955] mt7902 0000:01:00.0: Message 00000010 (seq 3) timeout
[   15.486968] Workqueue: events mt7902_init_work [mt7902_common]
[   15.487008]  mt7902_mt76_connac2_load_patch+0xab/0x390 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   15.487036]  mt7902_mt792x_load_firmware+0x42/0x160 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   15.487056]  mt7902_run_firmware+0x2f/0x500 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   15.487083]  ? ____mt7902_mt76_poll_msec+0x75/0xb0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   15.487105]  mt7902e_mcu_init+0x4e/0x80 [mt7902 f53c80c43370a8bddecd2a69c8f618341c863fda]
[   15.487111]  mt7902_init_work+0x51/0x1d0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   15.487163] mt7902 0000:01:00.0: Failed to get patch semaphore
[   15.564258] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[   15.564264] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1

PS - is there a way to reload the module without a reboot if yes, then please tell

samveen commented 5 months ago

the function that actually does the semaphore tasks is mt7902_mt76_mcu_skb_send_and_get_msg . Would you add dump_stack() into it, and trace from there.

DarkMatter-999 commented 5 months ago

here you go. Same output

[    6.361104] mt7902 0000:01:00.0: enabling device (0000 -> 0002)
[    6.369006] mt7902 0000:01:00.0: ASIC revision: 79020000
[    6.450653] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[    6.450658] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1
[    9.513902] mt7902 0000:01:00.0: Message 00000010 (seq 1) timeout
[    9.513927] Workqueue: events mt7902_init_work [mt7902_common]
[    9.513996]  mt7902_mt76_connac2_load_patch+0xab/0x390 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[    9.514037]  mt7902_mt792x_load_firmware+0x42/0x160 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[    9.514066]  mt7902_run_firmware+0x2f/0x500 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[    9.514101]  mt7902e_mcu_init+0x4e/0x80 [mt7902 f53c80c43370a8bddecd2a69c8f618341c863fda]
[    9.514111]  mt7902_init_work+0x51/0x1d0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[    9.514200] mt7902 0000:01:00.0: Failed to get patch semaphore
[    9.591589] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[    9.591596] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1
[   12.713665] mt7902 0000:01:00.0: Message 00000010 (seq 2) timeout
[   12.713680] Workqueue: events mt7902_init_work [mt7902_common]
[   12.713711]  mt7902_mt76_connac2_load_patch+0xab/0x390 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   12.713733]  mt7902_mt792x_load_firmware+0x42/0x160 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   12.713747]  mt7902_run_firmware+0x2f/0x500 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   12.713768]  ? ____mt7902_mt76_poll_msec+0x75/0xb0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   12.713784]  mt7902e_mcu_init+0x4e/0x80 [mt7902 f53c80c43370a8bddecd2a69c8f618341c863fda]
[   12.713789]  mt7902_init_work+0x51/0x1d0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   12.713836] mt7902 0000:01:00.0: Failed to get patch semaphore
[   12.791032] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[   12.791037] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1
[   15.913736] mt7902 0000:01:00.0: Message 00000010 (seq 3) timeout
[   15.913768] Workqueue: events mt7902_init_work [mt7902_common]
[   15.913873]  mt7902_mt76_connac2_load_patch+0xab/0x390 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   15.913963]  mt7902_mt792x_load_firmware+0x42/0x160 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   15.914033]  mt7902_run_firmware+0x2f/0x500 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   15.914122]  ? ____mt7902_mt76_poll_msec+0x75/0xb0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   15.914198]  mt7902e_mcu_init+0x4e/0x80 [mt7902 f53c80c43370a8bddecd2a69c8f618341c863fda]
[   15.914215]  mt7902_init_work+0x51/0x1d0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   15.914380] mt7902 0000:01:00.0: Failed to get patch semaphore
[   15.991237] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[   15.991246] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1
[   19.113677] mt7902 0000:01:00.0: Message 00000010 (seq 4) timeout
[   19.113700] Workqueue: events mt7902_init_work [mt7902_common]
[   19.113758]  mt7902_mt76_connac2_load_patch+0xab/0x390 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   19.113796]  mt7902_mt792x_load_firmware+0x42/0x160 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   19.113823]  mt7902_run_firmware+0x2f/0x500 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   19.113860]  ? ____mt7902_mt76_poll_msec+0x75/0xb0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   19.113889]  mt7902e_mcu_init+0x4e/0x80 [mt7902 f53c80c43370a8bddecd2a69c8f618341c863fda]
[   19.113900]  mt7902_init_work+0x51/0x1d0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   19.113983] mt7902 0000:01:00.0: Failed to get patch semaphore
[   19.191161] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[   19.191168] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1
[   22.317121] mt7902 0000:01:00.0: Message 00000010 (seq 5) timeout
[   22.317142] Workqueue: events mt7902_init_work [mt7902_common]
[   22.317199]  mt7902_mt76_connac2_load_patch+0xab/0x390 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   22.317245]  mt7902_mt792x_load_firmware+0x42/0x160 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   22.317280]  mt7902_run_firmware+0x2f/0x500 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   22.317327]  ? ____mt7902_mt76_poll_msec+0x75/0xb0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   22.317365]  mt7902e_mcu_init+0x4e/0x80 [mt7902 f53c80c43370a8bddecd2a69c8f618341c863fda]
[   22.317375]  mt7902_init_work+0x51/0x1d0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   22.317462] mt7902 0000:01:00.0: Failed to get patch semaphore
[   22.394293] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[   22.394299] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1
[   25.513634] mt7902 0000:01:00.0: Message 00000010 (seq 6) timeout
[   25.513653] Workqueue: events mt7902_init_work [mt7902_common]
[   25.513701]  mt7902_mt76_connac2_load_patch+0xab/0x390 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   25.513737]  mt7902_mt792x_load_firmware+0x42/0x160 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   25.513752]  mt7902_run_firmware+0x2f/0x500 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   25.513774]  ? ____mt7902_mt76_poll_msec+0x75/0xb0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   25.513800]  mt7902e_mcu_init+0x4e/0x80 [mt7902 f53c80c43370a8bddecd2a69c8f618341c863fda]
[   25.513807]  mt7902_init_work+0x51/0x1d0 [mt7902_common 9c79798e9b590456621361f22c20d23b852a2ffb]
[   25.513862] mt7902 0000:01:00.0: Failed to get patch semaphore
[   25.593722] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[   25.593734] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1
samveen commented 5 months ago

Same output

That's not good. this means that

Would you add more tracing calls across the functions?

DarkMatter-999 commented 5 months ago

As it turns out in the mt7902_mt76_mcu_skb_send_and_get_msg function (mcu.c) the lines:

skb = mt7902_mt76_mcu_get_response(dev, expires);
ret = dev->mcu_ops->mcu_parse_response(dev, cmd, skb, seq);

returns -110 which causes the problem and drops out of the loop

PS - is there a way to reload/reinitiate the module without a reboot?

[    6.048436] mt7902 0000:01:00.0: enabling device (0000 -> 0002)
[    6.054931] mt7902 0000:01:00.0: ASIC revision: 79020000
[    6.134146] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[    6.134150] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1
[    6.134152] mt7902 0000:01:00.0: Entering mt7902_mt76_mcu_send_and_get_msg cmd: 16
[    6.134154] mt7902 0000:01:00.0: mt7902_mt76_mcu_msg_alloc 00000000629ec109
[    6.134162] Workqueue: events mt7902_init_work [mt7902_common]
[    6.134189]  mt7902_mt76_mcu_skb_send_and_get_msg+0x42/0x1b0 [mt7902_common d42f70a08464291a9821ca3786ddc3333014cec7]
[    6.134204]  mt7902_mt76_connac2_load_patch+0xa3/0x390 [mt7902_common d42f70a08464291a9821ca3786ddc3333014cec7]
[    6.134219]  mt7902_mt792x_load_firmware+0x42/0x160 [mt7902_common d42f70a08464291a9821ca3786ddc3333014cec7]
[    6.134233]  mt7902_run_firmware+0x2f/0x500 [mt7902_common d42f70a08464291a9821ca3786ddc3333014cec7]
[    6.134254]  ? ____mt7902_mt76_poll_msec+0x75/0xb0 [mt7902_common d42f70a08464291a9821ca3786ddc3333014cec7]
[    6.134268]  mt7902e_mcu_init+0x4e/0x80 [mt7902 f53c80c43370a8bddecd2a69c8f618341c863fda]
[    6.134273]  mt7902_init_work+0x51/0x1d0 [mt7902_common d42f70a08464291a9821ca3786ddc3333014cec7]
[    6.134318] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg acquiring mutex
[    6.134326] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg cmd: 16, ret: 0, wait_resp: 1
[    6.134328] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg expires: 4294880014
[    9.300627] mt7902 0000:01:00.0: Message 00000010 (seq 1) timeout
[    9.300637] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg > mcu_parse_response cmd: 16, ret: -110
[    9.300642] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg unlocking mutex
samveen commented 5 months ago

I believe unloading (rmmod) and then reloading the module(insmod) should do the deinitialization and reinitialization as expected.

modprobe loads dependencies too. In the case of the mt7902, this PR is trying to remove that dependency, so try insmod instead of modprobe.

would you move the dump_stack() from mt7902_mt76_mcu_skb_send_and_get_msg into mt7902_mt76_mcu_get_response ?

Another possibility is that the response parsing is failing as the parser isn't customized to the mt7902. Is it possible to dump the skb structure?

DarkMatter-999 commented 5 months ago

Moved it like you said here's the log, but sadly couldn't get the skb to print got kernel panics instead.

If you have any ideas on how to dump the skb please let me know, right now used this to try to print. I guess there is some null pointer problem that is leading to the kernel panic. I don't exactly know the structure of the skb struct.

unsigned char *data_ptr;
data_ptr = skb->data;
// dev_info(dev->dev, "mt7902_mt76_mcu_get_response Response : %p\n", skb->data);

for (int i = 0; i < skb->len; i++) {
    dev_info(dev->dev, "%02X ", data_ptr[i]);
}
dev_info(dev->dev, "\n");

Log is as follows:

[   10.469477] mt7902 0000:01:00.0: enabling device (0000 -> 0002)
[   10.476139] mt7902 0000:01:00.0: ASIC revision: 79020000
[   10.553892] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[   10.553896] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1
[   10.553898] mt7902 0000:01:00.0: Entering mt7902_mt76_mcu_send_and_get_msg cmd: 16
[   10.553901] mt7902 0000:01:00.0: mt7902_mt76_mcu_msg_alloc 00000000b71be876
[   10.553904] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg acquiring mutex
[   10.553908] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg cmd: 16, ret: 0, wait_resp: 1
[   10.553910] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg expires: 4294881339
[   10.553921] Workqueue: events mt7902_init_work [mt7902_common]
[   10.553955]  mt7902_mt76_mcu_get_response+0x2e/0x150 [mt7902_common 0dae977abb3b5716fcfa2c6043afa0e16f74a940]
[   10.553983]  mt7902_mt76_mcu_skb_send_and_get_msg+0xf7/0x1a0 [mt7902_common 0dae977abb3b5716fcfa2c6043afa0e16f74a940]
[   10.554014]  mt7902_mt76_connac2_load_patch+0xa3/0x390 [mt7902_common 0dae977abb3b5716fcfa2c6043afa0e16f74a940]
[   10.554040]  mt7902_mt792x_load_firmware+0x42/0x160 [mt7902_common 0dae977abb3b5716fcfa2c6043afa0e16f74a940]
[   10.554063]  mt7902_run_firmware+0x2f/0x500 [mt7902_common 0dae977abb3b5716fcfa2c6043afa0e16f74a940]
[   10.554096]  ? ____mt7902_mt76_poll_msec+0x75/0xb0 [mt7902_common 0dae977abb3b5716fcfa2c6043afa0e16f74a940]
[   10.554120]  mt7902e_mcu_init+0x4e/0x80 [mt7902 f53c80c43370a8bddecd2a69c8f618341c863fda]
[   10.554128]  mt7902_init_work+0x51/0x1d0 [mt7902_common 0dae977abb3b5716fcfa2c6043afa0e16f74a940]
[   13.570335] mt7902 0000:01:00.0: Message 00000010 (seq 1) timeout
[   13.570347] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg > mcu_parse_response cmd: 16, ret: -110
[   13.570353] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg unlocking mutex
DarkMatter-999 commented 5 months ago

Upon further exploration, in the function mt7902_mt76_mcu_skb_send_and_get_msg the following line returns an empty or a null pointer to the skb.

skb = mt7902_mt76_mcu_get_response(dev, expires);

and the following line in mt7902_mt76_mcu_get_response is responsible for this

struct sk_buff * ret = skb_dequeue(&dev->mcu.res_q);
dev_info(dev->dev, "mt7902_mt76_mcu_get_response ret: %p\n", ret);
return ret; 

Also, I was wondering why dont we try using the mt7925 driver as the base since mt7902 is much closer to that instead of mt7921 logs are:


[  961.908829] mt7902 0000:01:00.0: Entering mt7902_mt76_connac_mcu_patch_sem_ctrl function
[  961.908834] mt7902 0000:01:00.0: Calling mt7902_mt76_mcu_send_message with params 1
[  961.908836] mt7902 0000:01:00.0: Entering mt7902_mt76_mcu_send_and_get_msg cmd: 16
[  961.908839] mt7902 0000:01:00.0: mt7902_mt76_mcu_msg_alloc 000000003cce50b9
[  961.908841] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg acquiring mutex
[  961.908844] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg cmd: 16, ret: 0, wait_resp: 1
[  961.908846] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg expires: 4295166748
[  961.908847] mt7902 0000:01:00.0: mt7902_mt76_mcu_get_response expires: 4295166748
[  961.908849] mt7902 0000:01:00.0: mt7902_mt76_mcu_get_response timeout: 900
[  965.028026] mt7902 0000:01:00.0: mt7902_mt76_mcu_get_response ret: 0000000000000000
[  965.028053] Workqueue: events mt7902_init_work [mt7902_common]
[  965.028104]  mt7902_mt76_mcu_get_response+0xb2/0x190 [mt7902_common 23919710e07b6d4c7c52dc8c3d06c97e09472ddc]
[  965.028131]  mt7902_mt76_mcu_skb_send_and_get_msg+0xf9/0x1c0 [mt7902_common 23919710e07b6d4c7c52dc8c3d06c97e09472ddc]
[  965.028152]  mt7902_mt76_connac2_load_patch+0xa3/0x380 [mt7902_common 23919710e07b6d4c7c52dc8c3d06c97e09472ddc]
[  965.028178]  mt7902_mt792x_load_firmware+0x42/0x160 [mt7902_common 23919710e07b6d4c7c52dc8c3d06c97e09472ddc]
[  965.028199]  mt7902_run_firmware+0x2f/0x500 [mt7902_common 23919710e07b6d4c7c52dc8c3d06c97e09472ddc]
[  965.028223]  ? ____mt7902_mt76_poll_msec+0x75/0xb0 [mt7902_common 23919710e07b6d4c7c52dc8c3d06c97e09472ddc]
[  965.028246]  mt7902e_mcu_init+0x4e/0x80 [mt7902 1860a80f72a77cbd81eaa26de089087e455d1ba3]
[  965.028251]  mt7902_init_work+0x51/0x1d0 [mt7902_common 23919710e07b6d4c7c52dc8c3d06c97e09472ddc]
[  965.028304] mt7902 0000:01:00.0: SKB info: 0000000000000000
[  965.028307] mt7902 0000:01:00.0: Message 00000010 (seq 10) timeout
[  965.028310] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg > mcu_parse_response cmd: 16, ret: -110
[  965.028312] mt7902 0000:01:00.0: mt7902_mt76_mcu_skb_send_and_get_msg unlocking mutex
[  965.028314] mt7902 0000:01:00.0: Failed to get patch semaphore
[  965.105789] mt7902 0000:01:00.0: hardware init failed
samveen commented 5 months ago

Also, I was wondering why dont we try using the mt7925 driver as the base since mt7902 is much closer to that instead of mt7921 logs are:

That's a valid point. I remember looking at some documentation and deciding that the mt7921 looked quite promising. However, I cannot recall why now. If you want, I'll merge this PR, move it as old, and introduce the mt7925 code as the base.

If that work for you, just remove the WIP prefix from this PR, and I'll proceed with the merge, rename, and recreate.

DarkMatter-999 commented 5 months ago

Okay, sounds good then.

The main thing that makes me believe why mt7925 can work is, firstly, it's a more recent driver and secondly, according to the wifi specs of mt7925 and mt7902 the only major difference between them is the Wifi 7 capability of the mt7925 which would need to be patched.

Since mt7902's mcu appears to be different and the mt7921 driver doesn't work, the mt7925 I think is worth a try. ✌️