Open ValdikSS opened 1 year ago
Hi @ValdikSS
Ouch. That is bad behavior.
As much as I use usb wifi adapters, I never find the need to use that specific command so I can see why I would have missed this. May I ask what you are trying to accomplish?
Nick
I never find the need to use that specific command so I can see why I would have missed this. May I ask what you are trying to accomplish?
I'm trying to use rtl8811cu on OpenWrt 22.03.3, and this is what is does internally.
It locks here: https://github.com/morrownr/8821cu-20210916/blob/8cb0ee17015ed4453d9d9652004802745895f7a8/os_dep/linux/os_intfs.c#L2219
I'm not sure which exact mutex/semaphore should not be acquired for unregister_netdevice.
I took a look at where it locks today. I also tried to see when the last time that code was modified and could not determine when. It appears this code has not been touched in a long time and it is the same code is other WiFi 5 generation Realtek drivers so the search has to continue for a hint as to what could be causing this.
I use OpenWRT but only with adapters that use in-kernel drivers. I have no experience trying to use Realtek drivers on OpenWRT.
Have you tried the command on other platforms? x86_64 or ARM64?
I am hesitant to start playing with code until such time as we can narrow down whether this is an issue that affects more than just OpenWRT.
I am testing on a Debian 11.6 x86_64 now, just happen to be created a VM so figured I would try to replicate the issue. @ValdikSS, what specific release of OpenWRT are you using? I happen to be familiar with OpenWRT. What hardware? Might be worth testing on OpenWRT for x86 as well.
@Jibun-no-Kage, OpenWrt 22.03.3, x86 "generic" VM. https://downloads.openwrt.org/releases/22.03.3/targets/x86/generic/
Cool, that is what I needed to know, I will test on x86 generic OpenWRT as well. As for using the specific command, we pretty much all used NetworkManager in testing, Even when I did use iw I don't believe I ever have deleted an interface that way. Usually just replaced the entire configuration from scratch.
So x86_64, Debian 11.6, test results...
# ifconfig wlx00e032816834 wlx00e032816834: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether 66:da:6d:75:0d:5a txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
# iw dev wlx00e032816834 del
# ifconfig wlx00e032816834 wlx00e032816834: error fetching interface information: Device not found
Following the basic core steps for setup and configuration, i.e. install-driver.sh with DKMS for Debian, and options file defaults, plus US as region. Reboot after installation. I was able to delete the interface without error. If tried both with the interface just created, then created and explicitly associated via nmcli. Same result no error.
So, might be specific to OpenWRT? Was there a different sequence of steps done?
Tomorrow I will create another distribution and validate... something not Debian/Unbuntu based. Maybe x86_64 Manjaro? And will try OpenWRT generic x86_64 as well, just as a sanity test.
I have the same issue on Debian 12, on Fedora 37.
The exact steps are:
iw dev _device_ del
That's it.
For the record: https://github.com/aircrack-ng/rtl8812au/issues/514 https://github.com/aircrack-ng/rtl8812au/issues/374 https://github.com/diederikdehaas/rtl8812AU/issues/56 https://github.com/brektrou/rtl8821CU/issues/126
If you're going to debug the issue, you might need to add printk
's to every lock in the driver, and to notification functions, as I assume the deadlock is somewhere there (unregister_netdevice
sends a notification during unregister process, which is handled by the driver).
The other way is to use lockdep kernel subsystem.
I was able to recreate the hang on Manjaro x86_64. So at least we know this might be a greater scope than just OpenWRT. It did not crash the entire instance, but definitely hung up the network stack. I could not reference the interface via terminal session or via network manager via the UI. Manjaro is a pretty new kernel, much newer that Debian 11.6. Don't think that is maternal per se, as yet, but was a significant difference I noticed.
Wonder why Debian 11.6 did not hang but Debian 12 did? That might be an important clue.
@ValdikSS @Jibun-no-Kage
When I get to this issue, I think I will test 88x2bu to see what happens. That might expand the hunt enough to help.
@ValdikSS
You might want to join the conversation at:
https://github.com/morrownr/8821au-20210708/issues/70
It seems related.
@ValdikSS
I am still working the related issue of interface renaming and I found that the messy log is related to persistent interface names
as I turned it off and everything is clean. Still working on that.
Back to the original issue here iw dev wlan0 del
.
I need more info. Can I get you to give me a step by step list of exactly what commands you are using? I need to duplicate this.
A thought I had is that using iw ... del on a Realtek out-of-kernel driver would be very rare because they don't don't support adding another interface to the same adapter so why would del need to be used. Well, they support concurrent mode but that is a different animal.
Let me back up a minute. Something I noticed when I started this site around 3 years ago is people would show up posting problems using a guide that was built for in-kernel drivers expecting the Realtek driver to be able to do the same thing the in-kernel driver can do. That is not the way it is and here is why:
The in-kernel drivers such as the ar9271, rt5370, mt7610u, nt7612u and mt7921u are built using the mac80211 and stack capability per Linux Standards. Realtek and their out-of-kernel drivers are built to one degree or another on the old depreciated Wireless Extentions (WEXT) technology. I get asked on a regular basis why I don't submit the Realtek drivers here to the kernel maintainers to have it be in-kernel. Well, I can't because it would not be accepted because the Realtek drivers are based on old, depreciated technology. The submission would be denied. In fact, with WiFi 7, Realtek has no choice but to either drop Linux support or start doing the drivers in accordance with current standards as the old technology will no longer work with WiFi 7. Period.
We maintain these driver to help Linux users but if you look around, I push users to seek out adapters that use in-kernel drivers because they will operate in accordance with the guides that are posted in many places.
@morrownr
Step-by-step list of the actions to reproduce:
iw dev wlan0 del
as root, where wlan0
is the interface name.The command would hang indefinitely.
To get the iw
docs, I did the following:
iw --help > iw.txt
Then I compressed it and attached it.
From the iw docs:
dev <devname> del
Remove this virtual interface
So, we need a virtual interface. Your wlan0 is not a virtual interface. How do I know this?
Run:
iw list
Look for:
valid interface combinations
You won't find it. You will find:
interface combinations are not supported
None of the Realtek out-of-kernel drivers here or anywhere else support interface combinations.
I was able to duplicate your result and I must say that hanging the terminal is not a good thing but this has probably not come up because anyone working on iw
probably knows not to use this command on a Realtek driver.
There is a reason that I put a statement in the docs encouraging Linux users to seek out adapters with in-kernel drivers. You have found one reason but there are many.
If, for example you get a Alfa ACM:
valid interface combinations:
total <= 2, #channels <= 1, STA/AP BI must match
If you go to the USB-WiFi Main Menu:
https://github.com/morrownr/USB-WiFi
You can select menu item 4 and get a list of many adapters and the capabilities. You have to pick out an adapter with the virtual interface capabilities you want because different drivers/chipsets mean different capabilities.
Sorry about not pointing this out right up front but I started to test and noticed the problem renaming an interface and I got focused on that... which is also a Realtek only problem.
I'm going to show this issue as can't fix.
@morrownr
Just FYI, rtw88 mainline kernel driver (and its backport https://github.com/lwfinger/rtw88) is currently in a state which allows for 8811cu dongles to work properly in both 2.4 and 5 GHz, client/ap/monitor/injection mode without any serious issues.
@ValdikSS
I've been testing rtw88 8812bu support with kernel 6.3 lately. It is coming along nicely. I'll test 8811cu starting soon. My hope is that support for both of those chipsets is in very good shape within maybe a year from now so that I can look at discontinuing support for this driver and the 88x2bu out-of-kernel drivers. That would give me more time to try to keep the older drivers supporting the 8812au and 8811au chipsets in good shape.
I got the same problem on OpenWrt, the following script will call iw dev $wlan del
when restart network
service:
https://github.com/openwrt/openwrt/blob/openwrt-21.02/package/kernel/mac80211/files/lib/netifd/wireless/mac80211.sh
Hi @bGN4
I do not support OpenWRT with this driver in this repo. I do use OpenWRT. What I recommend OpenWRT users do if they want to use a usb wifi adapter is they use the adapters based on chipsets that are well supported in OpenWRT. That includes:
mt7921au mt7612u mt7610u
The Main Menu for this site includes a Plug and Play list that includes many adapters based on those chipsets:
@ValdikSS did you find any solution/workaround? I wonder why rtw88 doesn't have this problem.
found this: https://forums.developer.nvidia.com/t/fail-to-unload-the-nic-driver-for-realtek/255580/2 but probably unrelated
did you find any solution/workaround?
I did not, didn't try to find it really.
I wonder why rtw88 doesn't have this problem.
Because this is a driver bug, not a hardware issue. rtw88 doesn't have this bug. It's better from many ways, I don't see any point in using this Realtek driver if we have rtw88.
@ntzb
I wonder why rtw88 doesn't have this problem.
The driver in this repo and the rtw88 driver are totally different drivers. Realtek does not support USB very well at all. Even the USB support for rtw88 is provided by the community.
Make sure you are using kernel 6.9 to pick 5 important patches that went in earlier this year. If you want to test and report on the downstream dev version of rtw88:
https://github.com/morrownr/8821cu-20210916/issues/115
That is right here in this repo. The first message gives the link to the downstream rtw88 that we are using to do work.
Don't expect anything other than API maintenance on this out-of-kernel driver here in this repo and I plan to take it down later this year as long as the in-kernel rtw88 is in good shape.
@morrownr
Because this is a driver bug, not a hardware issue. rtw88 doesn't have this bug. It's better from many ways, I don't see any point in using this Realtek driver if we have rtw88.
The driver in this repo and the rtw88 driver are totally different drivers. Realtek does not support USB very well at all. Even the USB support for rtw88 is provided by the community.
well, my device is a rtw8852cu (0x0bda, 0xc832), and the drivers at lwfinger/rtw8852 suffer from the same issue (apparently, same symptoms on Openwrt, I didn't debug the issue as mentioned by @ValdikSS in one of the posts above). I don't see support in rtw88 unfortunately
@ntzb
I don't see support in rtw88 unfortunately
You won't. If usb support is going to happen for the 8852cu or any other Realtek WiFi 6 chips, it will happen in rtw89. The is work ongoing to support the remaining unsupported chips in rtw88 but it may take a long time to get to rtw89.
Tell me what you are trying to do and what hardward you have (like a RasPi4B for example) and I'll offer you some options.
I've come across a rtw8832cu device, that I could've used in my openwrt x86 machine. I tried to use the current linux drivers for it, but it suffers from the same problem that's described in this post. I'm not looking to replace it with anything, just wanted to utilize it, since the drivers already exist (just happen to be that they are bad)
I've come across a rtw8832cu device, that I could've used in my openwrt x86 machine.
Using any Realtek based adapter or card with OpenWRT is a challenge. This might change in the future but it is a frustrating thing for now.
I'm not looking to replace it with anything, just wanted to utilize it, since the drivers already exist (just happen to be that they are bad)
The Realtek WiFi 6 usb wifi adapters driver are bad. Really bad.
Maybe I have been too subtle with my advice for Linux users to avoid WiFi 6 usb wifi adapters with Realtek chips. I'll see about adding some warnings around the site. Not only is the driver support out-of-kernel, which not good, but it is really bad even for out-of-kernel drivers and that is not like to change anytime soon. Then there is the issue that all of the WiFi 6 Realtek based adapters that I have seen are multi-state. Linux users can do so much better than these adapters. You are aware of the site Main Menu?
https://github.com/morrownr/USB-WiFi
Reading menu items 1 and 2 is a really good idea. You have a choice to make. Get a good adapter that works well with OpenWRT now or wait a few years to see what Realtek does.
Apparently netdev (un)registration changed since linux 5.12, so (un)registernetdevice should be replaced by cfg80211(un)register_netdevice inside cfg80211 callcack. I have no 8821cu device on hand, but those who have it could try to apply patch similar to https://github.com/morrownr/88x2bu-20210702/pull/223
I can confirm this helps with the hang.
I'm already testing this code change for a few days
https://github.com/ntzb/rtw8852cu/commit/891e3db8f525a6fb2d65e3ad928fd4a046e8d40a
it seems fine, but at least on a 8832cu device, it introduces regd
exceptions, which do not interfere with the "simple" operation of the device
@ntzb
it introduces regd exceptions
Can I get you to go into more detail? I have merged @alex3d 's pr but am hesitant to use it on additional repos until I have more info.
the changes I found to help with the lock of rtnl, i.e. changing from *register_netdevice
to cfg80211_*register_netdevice
, are only a workaround, really (at least that's what I understood reading the torvalds/linux
commit (which I linked in mine) and the mailing list about the "new" cfg80211 registration functions).
the driver should have worked with the regular registration functions just fine. it's just the driver is bad (digging through it you can clearly see it, as I mentioned here https://github.com/lwfinger/rtw8852cu/issues/15)
here's one of the exception I have, when using the workaround (with some of my previous debugging prints starting with ***
):
[ 2029.512998] RTW: rtw_wiphy_register(phy1)
[ 2029.513559] RTW: Register RTW cfg80211 vendor cmd(0x67) interface
[ 2029.514526] *** at rtw_update_wiphy_regd / before rtnl_lock
[ 2029.515287] *** at rtw_update_wiphy_regd / after rtnl_lock
[ 2029.516057] ------------[ cut here ]------------
[ 2029.516680] WARNING: CPU: 0 PID: 63 at regulatory_set_wiphy_regd+0xa87/0xb80 [cfg80211]
[ 2029.517580] Modules linked in: rtw88_8821ce rtw88_8821c pppoe ppp_async nft_fib_inet nf_flow_table_inet mt7921u mt7921_common iwldvm 8852cu iwlmvm rtw88_pci rtw88_core rtl8xxxu pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt792x_usb mt792x_lib mt76_usb mt76_connac_lib mt76 mac80211 lzo iwlwifi cfg80211 ax88179_178a usbnet slhc r8169 nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 lzo_rle lzo_decompress lzo_compress libcrc32c igc forcedeth e1000e crc_ccitt compat bnx2 i2c_dev dwmac_intel dwmac_generic stmmac_platform stmmac ixgbe e1000 amd_xgbe mdio nls_utf8 pcs_xpcs ena sha512_ssse3 sha512_generic sha3_generic seqiv jitterentropy_rng drbg hmac cmac crypto_acompress nls_iso8859_1 nls_cp437 igb vfat fat button_hotplug tg3
[ 2029.517733] realtek mii
[ 2029.527119] CPU: 0 PID: 63 Comm: kworker/0:2 Not tainted 6.1.89 #0
[ 2029.527992] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[ 2029.529298] Workqueue: usb_hub_wq hub_event
[ 2029.530019] RIP: 0010:regulatory_set_wiphy_regd+0xa87/0xb80 [cfg80211]
[ 2029.530945] Code: 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc 49 8d 7d 70 be ff ff ff ff e8 d1 5d 6e e1 85 c0 0f 85 d4 fe ff ff <0f> 0b e9 cd fe ff ff 0f 0b 80 3d d1 0c 04 00 00 0f 85 b0 fe ff ff
[ 2029.533228] RSP: 0000:ffffc9000060b5b0 EFLAGS: 00010246
[ 2029.534073] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[ 2029.535076] RDX: 0000000000000000 RSI: ffff888006ade810 RDI: ffff8880046609f0
[ 2029.536068] RBP: ffffc9000060b630 R08: 0000000017243562 R09: 000000000000005d
[ 2029.537093] R10: 0000000000000000 R11: 0000000000000001 R12: ffff888006ade7a0
[ 2029.538108] R13: ffff888006ade7a0 R14: 000000000000006a R15: ffffc90005b91000
[ 2029.539154] FS: 0000000000000000(0000) GS:ffff88800f600000(0000) knlGS:0000000000000000
[ 2029.540263] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2029.541192] CR2: 00007f7416b6a3a8 CR3: 0000000002612000 CR4: 0000000000350ef0
[ 2029.542237] Call Trace:
[ 2029.542919] <TASK>
[ 2029.543585] ? show_regs.part.0+0x1e/0x24
[ 2029.544406] ? show_regs.cold+0x8/0xd
[ 2029.545166] ? __warn+0x6c/0xc0
[ 2029.545925] ? regulatory_set_wiphy_regd+0xa87/0xb80 [cfg80211]
[ 2029.546895] ? report_bug+0xbb/0x120
[ 2029.547688] ? handle_bug+0x44/0x90
[ 2029.548472] ? exc_invalid_op+0x18/0x70
[ 2029.549307] ? asm_exc_invalid_op+0x1b/0x20
[ 2029.550153] ? regulatory_set_wiphy_regd+0xa87/0xb80 [cfg80211]
[ 2029.551100] ? reg_get_max_bandwidth+0x277/0x2d0 [cfg80211]
[ 2029.552026] regulatory_set_wiphy_regd_sync+0x2f/0x90 [cfg80211]
[ 2029.552981] rtw_update_wiphy_regd+0x4a3/0x7a8 [8852cu]
[ 2029.553953] rtw_regd_change_complete_sync+0x4f/0x52f [8852cu]
[ 2029.554947] ? rtw_regd_change_complete_sync+0x4f/0x52f [8852cu]
[ 2029.555959] rtw_wiphy_register+0x10a/0x10f [8852cu]
[ 2029.556902] rtw_cfg80211_dev_res_register+0x10/0x1e [8852cu]
[ 2029.557906] rtw_os_ndevs_register+0x1c/0x104 [8852cu]
[ 2029.558869] rtw_os_ndevs_init+0x2b/0x3f [8852cu]
[ 2029.559898] rtw_usb_primary_adapter_init+0xa97/0xb79 [8852cu]
[ 2029.560927] usb_probe_interface+0xe3/0x240
[ 2029.561797] really_probe+0xd6/0x290
[ 2029.562588] __driver_probe_device+0x73/0xf0
[ 2029.563426] driver_probe_device+0x1f/0xf0
[ 2029.564252] __device_attach_driver+0x86/0x110
[ 2029.565091] ? driver_allows_async_probing+0x70/0x70
[ 2029.565963] bus_for_each_drv+0x6c/0xa0
[ 2029.566740] __device_attach+0xb6/0x1b0
[ 2029.567505] device_initial_probe+0xe/0x20
[ 2029.568278] bus_probe_device+0x9f/0xb0
[ 2029.569012] device_add+0x3d4/0x830
[ 2029.569746] usb_set_configuration+0x5e8/0x840
[ 2029.570513] usb_generic_driver_probe+0x50/0x70
[ 2029.571277] usb_probe_device+0x32/0xd0
[ 2029.571978] really_probe+0xd6/0x290
[ 2029.572656] __driver_probe_device+0x73/0xf0
[ 2029.573375] driver_probe_device+0x1f/0xf0
[ 2029.574072] __device_attach_driver+0x86/0x110
[ 2029.574778] ? driver_allows_async_probing+0x70/0x70
[ 2029.575497] bus_for_each_drv+0x6c/0xa0
[ 2029.576130] __device_attach+0xb6/0x1b0
[ 2029.576763] device_initial_probe+0xe/0x20
[ 2029.577412] bus_probe_device+0x9f/0xb0
[ 2029.578030] device_add+0x3d4/0x830
[ 2029.578626] usb_new_device+0x1c7/0x3c0
[ 2029.579266] hub_event+0xc64/0x1560
[ 2029.579854] ? lock_acquire.part.0.isra.0+0x59/0xb0
[ 2029.580538] ? process_one_work+0x21a/0x430
[ 2029.581164] ? process_one_work+0x21a/0x430
[ 2029.581775] process_one_work+0x256/0x430
[ 2029.582381] ? process_one_work+0x21a/0x430
[ 2029.582993] ? process_one_work+0x21a/0x430
[ 2029.583610] worker_thread+0x4a/0x3e0
[ 2029.584189] ? rescuer_thread+0x370/0x370
[ 2029.584785] kthread+0xbf/0xe0
[ 2029.585315] ? kthread_complete_and_exit+0x20/0x20
[ 2029.585969] ret_from_fork+0x1f/0x30
[ 2029.586546] </TASK>
[ 2029.587008] ---[ end trace 0000000000000000 ]---
to sum it up, in my view, merely replacing the functions is not the real solution to the hang, but a workaround that should be used after testing the effects it might have, in a case by case basis.
as you very rightfully wrote in your usb-wifi repo, inherently, these realtek drivers suck.
in a regular usage scenario, there's hardly use for iw dev wlan0 del
, and other methods can be used to avoid the lock, but in OpenWrt's case, it is mandatory, and the drivers should be in a better shape for any serious usage to be considered.
I don't mind digging into it some more, but so far have found no way of getting good speeds from my 8832cu adapter (mentioned here https://github.com/lwfinger/rtw8852cu/issues/15), so any more attempts at this stage, seem futile
Deletion of the wireless interface results in endless hang of
iw
process which could not be killed even withkill -9
and makes the device unable to operate normally due to hangs on any operations related to the network interface enumeration.The kernel would also unable to reboot properly.
How to reproduce (do not do that on a production system):
iw dev wlan0 del
, wherewlan0
is the card's interface nameI suppose there's a lock which is not getting released somewhere.