lwfinger / rtl8723bu

Driver for RTL8723BU
284 stars 144 forks source link

machine locks up when setting up an ad-hoc access point in linux-4.4.0 #21

Open ghost opened 8 years ago

ghost commented 8 years ago

When I set up an ad-hoc access point using wicd on slackware64-current (x64, kernel linux-4.4.0) the machine locks up completely. what info can I provide to debug this issue?

lwfinger commented 8 years ago

You need to switch to the logging console before you initiate the action that causes the lockup, and then take and post a photo of the screen. You could also setup network logging, but that is much more complicated and requires a second network device.

ghost commented 8 years ago

ok, I managed to get a lockup from the console:

iw dev wlan0 scan passive

On 2/2/16, lwfinger notifications@github.com wrote:

You need to switch to the logging console before you initiate the action that causes the lockup, and then take and post a photo of the screen. You could also setup network logging, but that is much more complicated and requires a second network device.


Reply to this email directly or view it on GitHub: https://github.com/lwfinger/rtl8723bu/issues/21#issuecomment-178647567

ghost commented 8 years ago

seems like the attachments didn't get here via the mail... uploading them via the web interface

img_20160202_215304 img_20160202_215312 img_20160202_215322 img_20160202_215255

iw-log1.txt

lwfinger commented 8 years ago

Thanks for the picture. I think I found the place where it fails and changed the code to log a message. Please pull these changes and tell me (a) if it still crashes, and (b) if you see a message with "****" logged if it doesn't.

ghost commented 8 years ago

still looks the same after the upgrade... where should the "*****" be logged?

img_20160204_201403 img_20160204_201417 img_20160204_201426 img_20160204_201431

lwfinger commented 8 years ago

I "fixed" the wrong place. Do a new pull and try again. The printout of the stars has been removed as it was not useful.

ghost commented 8 years ago

it's stil crashing, but in a different place it seems...

[ 273.406289] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 273.407792] IP: [] cfg80211_rtw_scan+0xa7/0x67c [8723bu] [ 273.409257] PGD 252ca4067 PUD 256cd5067 PMD 0 [ 273.410664] Oops: 0000 [#1] SMP [ 273.411512] Modules linked in: 8723bu(O) i2c_dev appletalk ax25 ipx p8023 p8022 psnap llc ipv6 hid_generic ums_realtek cfg80211 uas btusb usb_storage usbhid btrtl fuse snd_hda_codec_hdmi nvidia(PO) snd_hda_codec_realtek snd_hda_codec_generic i915 evdev coretemp hwmon drm_kms_helper intel_rapl intel_gtt iosf_mbi i2c_algo_bit fb_sys_fops x86_pkg_temp_thermal syscopyarea sysfillrect intel_powerclamp sysimgblt kvm_intel snd_hda_intel kvm snd_hda_codec irqbypass snd_hda_core snd_hwdep drm crct10dif_pclmul snd_pcm crc32_pclmul crc32c_intel efi_pstore snd_timer agpgart snd r8169 hci_uart i2c_i801 serio_raw efivars soundcore btbcm mii btqca btintel bluetooth wmi fan thermal pinctrl_sunrisepoint battery i2c_hid rfkill hid video pinctrl_intel i2c_core intel_lpss_acpi intel_lpss tpm_crb xhci_pci mei_me tpm_tis [ 273.415409] xhci_hcd mei shpchp tpm acpi_pad button acpi_als kfifo_buf industrialio fjes processor loop [ 273.416428] CPU: 0 PID: 2568 Comm: iw Tainted: P W O 4.4.0 #2 [ 273.416941] Hardware name: MEDION Akoya P5320 E MD8875/2436/H110H4-CM2, BIOS 110H4W0X.105 09/08/2015 [ 273.417965] task: ffff880248579a80 ti: ffff88024e63c000 task.ti: ffff88024e63c000 [ 273.418490] RIP: 0010:[] [] cfg80211_rtw_scan+0xa7/0x67c [8723bu] [ 273.419553] RSP: 0018:ffff88024e63f7b8 EFLAGS: 00010246 [ 273.420090] RAX: 0000000000000001 RBX: ffffc90000089000 RCX: 0000000010624dd3 [ 273.420636] RDX: 000000000001f400 RSI: 0000000000000000 RDI: 00000000000007d0 [ 273.421178] RBP: ffff88024e63fae0 R08: 0000000000017da0 R09: ffffffff81a8f360 [ 273.421720] R10: ffffea0009a42c00 R11: 0000000000000000 R12: ffff880256c83400 [ 273.422262] R13: 0000000000000000 R14: ffffc9000008d948 R15: 0000000000000000 [ 273.422805] FS: 00007fb030940700(0000) GS:ffff880276400000(0000) knlGS:0000000000000000 [ 273.423355] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 273.423904] CR2: 0000000000000020 CR3: 0000000265014000 CR4: 00000000003406f0 [ 273.424461] Stack: [ 273.425013] ffffffff810b92a5 ffff88024e63f7e8 ffff880200000000 ffff880276414ce8 [ 273.425591] ffff880276414ce8 ffff880276414ce8 ffff880248729ae0 ffff88024e63f860 [ 273.426165] ffffffff810bb195 ffff8800836e4000 00000000003e7b0b ffff880000000000 [ 273.426740] Call Trace: [ 273.427296] [] ? update_curr+0xc5/0x130 [ 273.427866] [] ? dequeue_entity+0x415/0x980 [ 273.428439] [] ? ata_qc_issue+0x172/0x380 [ 273.429003] [] ? getblk_gfp+0x2b/0x60 [ 273.429557] [] ? __alloc_pages_nodemask+0x172/0xac0 [ 273.430108] [] ? get_page_from_freelist+0x3a9/0x910 [ 273.430659] [] ? get_page_from_freelist+0x3a9/0x910 [ 273.431193] [] ? get_page_from_freelist+0x3a9/0x910 [ 273.431710] [] ? nlmsg_put+0x6e/0x80 [ 273.432212] [] ? nla_reserve+0x41/0x50 [ 273.432702] [] ? nla_put+0x20/0x30 [ 273.433176] [] ? nla_put+0x36/0x40 [ 273.433635] [] ? skb_queue_tail+0x43/0x50 [ 273.434083] [] ? kmalloc+0x210/0x230 [ 273.434520] [] ? nl80211_trigger_scan+0x16b/0x730 [cfg80211] [ 273.434952] [] nl80211_trigger_scan+0x476/0x730 [cfg80211] [ 273.435372] [] genl_family_rcv_msg+0x18a/0x340 [ 273.435788] [] ? netlink_sendskb+0x16c/0x250 [ 273.436193] [] ? genl_family_rcv_msg+0x340/0x340 [ 273.436588] [] genl_rcv_msg+0x76/0xb0 [ 273.436971] [] netlink_rcv_skb+0xa4/0xc0 [ 273.437345] [] genl_rcv+0x28/0x40 [ 273.437704] [] netlink_unicast+0x108/0x190 [ 273.438060] [] netlink_sendmsg+0x487/0x5d0 [ 273.438409] [] sock_sendmsg+0x38/0x50 [ 273.438756] [] _sys_sendmsg+0x289/0x2a0 [ 273.439102] [] ? destroy_inode+0x38/0x60 [ 273.439447] [] ? mem_cgroup_try_charge+0x81/0x1b0 [ 273.439792] [] ? lru_cache_add_active_or_unevictable+0x27/0x90 [ 273.440137] [] ? handle_mm_fault+0x131c/0x13f0 [ 273.440476] [] ? dput+0x1c6/0x200 [ 273.440808] [] sys_sendmsg+0x42/0x80 [ 273.441134] [] SyS_sendmsg+0x12/0x20 [ 273.441451] [] entry_SYSCALL_64_fastpath+0x16/0x6e [ 273.441771] Code: d0 07 00 00 48 89 df e8 c0 92 fd ff 85 c0 0f 84 ba 05 00 00 c7 85 e8 fc ff ff 00 00 00 00 80 bb 2c 45 00 00 02 0f 85 e0 00 00 00 <41> 80 7d 20 07 0f 86 d5 00 00 00 ba 07 00 00 00 48 c7 c6 10 bc [ 273.442886] RIP [] cfg80211_rtw_scan+0xa7/0x67c [8723bu] [ 273.443250] RSP [ 273.443600] CR2: 0000000000000020 [ 273.443951] ---[ end trace 38fc60436070b46d ]---

lwfinger commented 8 years ago

I guess we are gaining; however, the make file was stripping the object files and I could not use gdb to find the bad code location. In addition, I could not figure it out without that info.

Please pull and try again. Do a "make clean" before the make. It will crash again, but I should be able to interpret the traceback.

ghost commented 8 years ago

as far as I can see you did not push anything...

ghost commented 8 years ago

is this any help? I added a "-g" and removed the "--strip-debug" but I'm not sure if that was what you wanted...

[ 112.919964] RTL871X: module init start [ 112.919967] RTL871X: rtl8723bu v4.3.6.11_12942.20141204_BTCOEX20140507-4E40 [ 112.919968] RTL871X: rtl8723bu BT-Coex version = BTCOEX20140507-4E40 [ 112.973705] RTL871X: rtw_ndev_init(wlan0) [ 112.973975] usbcore: registered new interface driver rtl8723bu [ 112.973977] RTL871X: module init ret=0 [ 130.314419] RTL871X: RTW_ADAPTIVITY_EN_AUTO, chplan:0x21, Regulation:0,0 [ 130.314421] RTL871X: RTW_ADAPTIVITY_MODE_NORMAL [ 130.834745] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready [ 132.843118] RTL871X: nolinked power save enter [ 144.975880] RTL871X: RTW_ADAPTIVITY_EN_AUTO, chplan:0x21, Regulation:0,0 [ 144.975882] RTL871X: RTW_ADAPTIVITY_MODE_NORMAL [ 145.586925] RTL871X: nolinked power save leave [ 145.789942] BUG: unable to handle kernel NULL pointer dereference at (null) [ 145.791186] IP: [] memcmp+0x9/0x40 [ 145.792405] PGD 265717067 PUD 26b6f9067 PMD 0 [ 145.793631] Oops: 0000 [#1] SMP [ 145.794856] Modules linked in: 8723bu(O) i2c_dev appletalk ax25 ipx p8023 p8022 psnap llc ipv6 hid_generic cfg80211 uas btusb btrtl usb_storage usbhid fuse snd_hda_codec_hdmi nvidia(PO) snd_hda_codec_realtek snd_hda_codec_generic evdev i915 coretemp hwmon intel_rapl iosf_mbi x86_pkg_temp_thermal drm_kms_helper intel_powerclamp intel_gtt kvm_intel i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt kvm efi_pstore snd_hda_intel irqbypass snd_hda_codec crct10dif_pclmul crc32_pclmul crc32c_intel snd_hda_core hci_uart efivars snd_hwdep serio_raw r8169 btbcm snd_pcm i2c_i801 drm btqca btintel snd_timer mii bluetooth snd thermal agpgart soundcore wmi fan i2c_hid rfkill hid battery video pinctrl_sunrisepoint i2c_core pinctrl_intel intel_lpss_acpi intel_lpss mei_me tpm_crb xhci_pci mei tpm_tis shpchp [ 145.798992] xhci_hcd tpm acpi_pad acpi_als button kfifo_buf industrialio fjes processor loop [last unloaded: 8723bu] [ 145.799981] CPU: 0 PID: 2240 Comm: iw Tainted: P W O 4.4.0 #2 [ 145.800483] Hardware name: MEDION Akoya P5320 E MD8875/2436/H110H4-CM2, BIOS 110H4W0X.105 09/08/2015 [ 145.801518] task: ffff88026718b500 ti: ffff88024d1b8000 task.ti: ffff88024d1b8000 [ 145.802056] RIP: 0010:[] [] memcmp+0x9/0x40 [ 145.802597] RSP: 0018:ffff88024d1bb728 EFLAGS: 00010202 [ 145.803136] RAX: 0000000000000001 RBX: ffffc90000030000 RCX: 0000000010624dd3 [ 145.803684] RDX: 0000000000000007 RSI: ffffffffc061ab18 RDI: 0000000000000000 [ 145.804232] RBP: ffff88024d1bb728 R08: 0000000000017da0 R09: ffffffff81a8f360 [ 145.804779] R10: ffffea0009a5de80 R11: 0000000000000000 R12: ffff88026b694000 [ 145.805321] R13: ffffc900000349b0 R14: ffff88024f271800 R15: 0000000000000000 [ 145.805866] FS: 00007f3e5bf76700(0000) GS:ffff880276400000(0000) knlGS:0000000000000000 [ 145.806422] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 145.806975] CR2: 0000000000000000 CR3: 0000000268cab000 CR4: 00000000003406f0 [ 145.807534] Stack: [ 145.808085] ffff88024d1bb738 ffffffffc05c5264 ffff88024d1bbae0 ffffffffc05d8334 [ 145.808659] ffffffff815d5ecc 0000000081769090 0000000000000000 ffff88008590f400 [ 145.809234] ffff880083781100 0000000001752b40 ffffea000952f380 ffff88026a7c8a10 [ 145.809806] Call Trace: [ 145.810379] [] _rtw_memcmp+0x10/0x1a [8723bu] [ 145.810958] [] cfg80211_rtw_scan+0xc6/0x697 [8723bu] [ 145.811528] [] ? blk_rq_map_sg+0x1dc/0x440 [ 145.812093] [] ? update_curr+0xc5/0x130 [ 145.812648] [] ? radix_tree_lookup_slot+0x13/0x30 [ 145.813201] [] ? find_get_entry+0x1e/0xa0 [ 145.813752] [] ? bh_lru_install+0x158/0x180 [ 145.814303] [] ? mempool_alloc_slab+0x15/0x20 [ 145.814844] [] ? mempool_alloc_slab+0x15/0x20 [ 145.815375] [] ? mempool_alloc+0x5f/0x150 [ 145.815881] [] ? getblk_gfp+0x2b/0x60 [ 145.816374] [] ? alloc_request_struct+0x17/0x20 [ 145.816857] [] ? sg_init_table+0x1a/0x40 [ 145.817325] [] ? blk_rq_map_sg+0x1dc/0x440 [ 145.817782] [] ? scsi_init_cmd_errh+0x90/0x90 [ 145.818227] [] ? scsi_init_sgtable+0x44/0x80 [ 145.818658] [] ? swiotlb_map_sg_attrs+0x6d/0x130 [ 145.819078] [] ? nlmsg_put+0x6e/0x80 [ 145.819492] [] ? nla_reserve+0x41/0x50 [ 145.819894] [] ? nla_put+0x20/0x30 [ 145.820280] [] ? nla_put+0x36/0x40 [ 145.820655] [] ? skb_queue_tail+0x43/0x50 [ 145.821024] [] ? kmalloc+0x210/0x230 [ 145.821389] [] ? nl80211_trigger_scan+0x16b/0x730 [cfg80211] [ 145.821757] [] nl80211_trigger_scan+0x476/0x730 [cfg80211] [ 145.822125] [] genl_family_rcv_msg+0x18a/0x340 [ 145.822490] [] ? d_alloc+0x25/0x170 [ 145.822853] [] ? genl_family_rcv_msg+0x340/0x340 [ 145.823217] [] genl_rcv_msg+0x76/0xb0 [ 145.823573] [] netlink_rcv_skb+0xa4/0xc0 [ 145.823921] [] genl_rcv+0x28/0x40 [ 145.824262] [] netlink_unicast+0x108/0x190 [ 145.824598] [] netlink_sendmsg+0x487/0x5d0 [ 145.824927] [] sock_sendmsg+0x38/0x50 [ 145.825248] [] _sys_sendmsg+0x289/0x2a0 [ 145.825568] [] ? destroy_inode+0x38/0x60 [ 145.825881] [] ? mem_cgroup_try_charge+0x81/0x1b0 [ 145.826192] [] ? lru_cache_add_active_or_unevictable+0x27/0x90 [ 145.826508] [] ? handle_mm_fault+0x131c/0x13f0 [ 145.826820] [] ? dput+0x1c6/0x200 [ 145.827132] [] sys_sendmsg+0x42/0x80 [ 145.827442] [] SyS_sendmsg+0x12/0x20 [ 145.827750] [] entry_SYSCALL_64_fastpath+0x16/0x6e [ 145.828061] Code: 5d c3 3c 30 74 0b 3c 31 75 f1 31 c0 c6 06 01 5d c3 31 c0 c6 06 00 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 85 d2 48 89 e5 74 33 <0f> b6 07 0f b6 0e 29 c8 75 22 48 83 ea 01 31 c9 eb 15 44 0f b6 [ 145.829193] RIP [] memcmp+0x9/0x40 [ 145.829553] RSP [ 145.829907] CR2: 0000000000000000 [ 145.830259] ---[ end trace 1f6d74110eb2b996 ]---

2120 #ifdef CONFIG_P2P 2121 if( pwdinfo->driver_interface == DRIVER_CFG80211 ) 0x000000000004930f <+161>: cmpb $0x2,0x4594(%rbx) 0x0000000000049316 <+168>: jne 0x493f5 <cfg80211_rtw_scan+391>

2122 { 2123 if (_rtw_memcmp(ssids->ssid, "DIRECT-", 7) && 0x000000000004931c <+174>: mov $0x7,%edx 0x0000000000049321 <+179>: mov $0x0,%rsi 0x0000000000049328 <+186>: mov -0x388(%rbp),%rdi 0x000000000004932f <+193>: callq 0x49334 <cfg80211_rtw_scan+198> 0x000000000004933e <+208>: test %eax,%eax 0x0000000000049340 <+210>: je 0x493f5 <cfg80211_rtw_scan+391> 0x000000000004935f <+241>: test %rax,%rax 0x0000000000049362 <+244>: je 0x493eb <cfg80211_rtw_scan+381>

lwfinger commented 8 years ago

It seems that "ssids" in line 2123 of os_dep/ioctl_cfg80211.c is NULL. The new version protects against that.

ghost commented 8 years ago

yep, that fixed the OOPS. but the machine still locks up when using "wicd" from X. I'll need to try some more things to get info on that lock up

ghost commented 8 years ago

this is all I can get, no scrollback unfortunately since it's all locked up.

img_20160206_180008

lwfinger commented 8 years ago

Not seeing the location that actually crashed is a problem. The location in rtw_free_network_nolock() calls rtw_cfg80211_unlink_bss(), which is as far back as I can trace. I added some checking to the latter routine. Perhaps for once I got lucky.

lwfinger commented 8 years ago

I just sent another trial.

ghost commented 8 years ago

that's more of the previous oops img_20160206_231057

ghost commented 8 years ago

sorry, but I don't see the code you submitted 20 min ago

ghost commented 8 years ago

I added some more printfs and ifs of my own, and it looks like the "pnetwork" is null when rtl_cfg80211_unlink_bss is called

lwfinger commented 8 years ago

I just protected the unlink bss routine from arg 2 being NULL.