linux4wilc / driver

DEPRECATED: Updated Linux drivers for the ATWILC1000/ATWILC3000 products are located at https://github.com/linux4microchip/linux/tree/master/drivers/net/wireless/microchip/wilc1000. To simplify development, the legacy Linux4WILC was merged into the Linux4Microchip repo where driver development continues (Please refer latest ATWILC1000/ATWILC 3000 Wi-Fi Link Controller Linux User Guide) Driver code for Microchip ATWILC Wireless Devices (ATWILC1000 & ATWILC3000)
https://www.microchip.com/wwwproducts/en/ATWILC1000
32 stars 19 forks source link

Wilc1000 driver memory leak after connected to AP and disconnected #93

Open xiebluelike opened 4 years ago

xiebluelike commented 4 years ago

I used the development board sama5d27-som1-ek1 of microchip to test the Wilc1000-SD module; I compiled wilc's driver as a ko file, loaded and ran it with a shell script like below:

#!/bin/sh
index=0;
echo "Start to Test"
#wifi reset
wifi_reset_action()
{
    echo "WIFI Reset"
    #remove wilc-sdio driver
    sleep 5
    rmmod wilc-sdio

    #install wilc-sdio driver
    insmod /root/wilc-sdio.ko
    sleep 3
}

wifi_reset_action

#wifi runing loop
while true
do
        echo "ctrl_interface=/var/run/wpa_supplicant" > /tmp/wilc_wpa_supplicant.conf
        echo "ap_scan=1" >> /tmp/wilc_wpa_supplicant.conf

        wpa_supplicant -iwlan0 -Dnl80211 -f /tmp/ice1001_wpa_log -c /tmp/wilc_wpa_supplicant.conf & echo $! > /tmp/ice_wifi_scan.pid
        sleep 3
        wpa_cli -i wlan0 scan
        sleep 2
        wpa_cli scan_result | grep xie_router
        if [ $? -ne 0 ];then
           echo "can't scan ap Now................., Try reset wifi module"
           wifi_reset_action
           continue
        fi

        wpa_cli -p/var/run/wpa_supplicant add_network
        wpa_cli -p/var/run/wpa_supplicant set_network 0 ssid '"xie_router"'
        wpa_cli -p/var/run/wpa_supplicant set_network 0 psk '"123456789"'
        wpa_cli -p/var/run/wpa_supplicant select_network 0
        while true
        do
            #connect failed because wrong password
            cat /tmp/ice1001_wpa_log | grep "4-Way Handshake failed"
            if [ $? -ne 0 ];then
               echo "Waitting"
            else
               echo "Get Wifi Connected failed because psk key!!!!!"
               wpa_cli -i wlan0 remove_network 0
               break
            fi

            #Connection is associated
            cat /tmp/ice1001_wpa_log | grep "CTRL-EVENT-CONNECTED"
            if [ $? -ne 0 ];then
                echo "Waitting"
            else
                echo "Connected to AP Successful"
                wpa_cli -i wlan0 disconnect
                wpa_cli -i wlan0 remove_network 0
                break
            fi

            if [ $index -lt 10 ];then
               echo "waitting"
            else
               echo "Try Connectting to AP TimeOut"
               wpa_cli -i wlan0 remove_network 0
               break;
            fi
            index=`expr $index + 1`;
            sleep 1
        done

        sleep 1
        #quit wpa_supplicant
        kill `cat /tmp/ice_wifi_scan.pid` > /dev/null 2>&1

        rm /tmp/ice_wifi_scan.pid
        rm /tmp/ice1001_wpa_log
        rm /tmp/wilc_wpa_supplicant.conf -rf
        sleep 20
        free >> /wilc_test/test_info.txt
        cat /proc/meminfo | grep SUnreclaim >> /wilc_test/test_info.txt
        index=0;
done

After running for 12 hours and found that every time the connection to the AP was initiated and then disconnected, there would be a memory leak;

Some record from /wilc_test/test_info.txt like below:

Start----------------->
             total       used       free     shared    buffers     cached
Mem:        119956      26084      93872        156       2464       9628
-/+ buffers/cache:      13992     105964
Swap:            0          0          0
SUnreclaim:         4308 kB
             total       used       free     shared    buffers     cached
Mem:        119956      26112      93844        172       2472       9648
-/+ buffers/cache:      13992     105964
Swap:            0          0          0
SUnreclaim:         4304 kB

......

             total       used       free     shared    buffers     cached
Mem:        119956      34216      85740        344       6468      10168
-/+ buffers/cache:      17580     102376
Swap:            0          0          0
SUnreclaim:         7384 kB
             total       used       free     shared    buffers     cached
Mem:        119956      34200      85756        356       6468      10180
-/+ buffers/cache:      17552     102404
Swap:            0          0          0
SUnreclaim:         7376 kB
             total       used       free     shared    buffers     cached
Mem:        119956      34236      85720        372       6468      10196
-/+ buffers/cache:      17572     102384
Swap:            0          0          0
SUnreclaim:         7384 kB

......

SUnreclaim:        10260 kB
             total       used       free     shared    buffers     cached
Mem:        119956      37596      82360        416       6516      10520
-/+ buffers/cache:      20560      99396
Swap:            0          0          0
SUnreclaim:        10272 kB
             total       used       free     shared    buffers     cached
Mem:        119956      37624      82332        432       6516      10536
-/+ buffers/cache:      20572      99384
Swap:            0          0          0
SUnreclaim:        10280 kB
             total       used       free     shared    buffers     cached
Mem:        119956      37652      82304        448       6516      10552
-/+ buffers/cache:      20584      99372
Swap:            0          0          0
SUnreclaim:        10272 kB
END------>

Please let me know if there have any solution for this problem. Thanks!

ajaykathat commented 4 years ago

Hi @xiebluelike,

The memory usage can increase/decrease based on the usage and caching scheme. Now the driver doesn't maintain any scan results or connect information. All that information is maintained by the cfg80211 subsystem.

To isolate if this issue observed because of continuous connection and disconnection with AP, it is better to check memory usage with the individual operation. In the above sample script, the wpa_supplicant is also killed for each test iteration. Do you observe this same behavior if the wpa_supplicant is not killed during the operation? i.e performing continuous scan/connect/disconnect without killing wpa_supplicant.

xiebluelike commented 4 years ago

Hi @ajaykathat,

Thank you for your reply. I try to scan loop without killing wpa_supplicant, and there is still a memory leak; My testing script just like below:

#!/bin/sh
index=0;
echo "Start to Test"
#wifi reset
wifi_reset_action()
{
    echo "WIFI Reset"
    #remove wilc-sdio driver
    sleep 5
    rmmod wilc-sdio

    #install wilc-sdio driver
    insmod /root/wilc-sdio.ko
    sleep 3
}

wifi_reset_action
echo "ctrl_interface=/var/run/wpa_supplicant" > /tmp/wilc_wpa_supplicant.conf
echo "ap_scan=1" >> /tmp/wilc_wpa_supplicant.conf

wpa_supplicant -iwlan0 -Dnl80211 -f /tmp/ice1001_wpa_log -c /tmp/wilc_wpa_supplicant.conf & echo $! > /tmp/ice_wifi_scan.pid
sleep 3
#wifi runing loop
while true
do
        wpa_cli -i wlan0 scan
        sleep 10
        date >> /wilc_test/test_info.txt
    free >> /wilc_test/test_info.txt
    cat /proc/meminfo | grep SUnreclaim >> /wilc_test/test_info.txt
        index=0;
done

Looking forward to your reply. Thanks!

ajaykathat commented 4 years ago

Hi @xiebluelike,

Do you observe any difference in the rate at which memory is getting consumed between the modified and previous script for the same test time.

From the script, it looks like it's just performing the scan without any connect operation. The same can be simulated without wpa_supplicant with changes like below and it will also help to rule out its involvement.

......
#wpa_supplicant -iwlan0 -Dnl80211 -f /tmp/ice1001_wpa_log -c /tmp/wilc_wpa_supplicant.conf &
ifconfig wlan0 up
echo $! > /tmp/ice_wifi_scan.pid
sleep 3
#wifi runing loop
while true
do
        iw wlan0 scan
        sleep 10
        ......
done

Actually, during scan operation, the driver doesn't keep any copy of the scan result. It may be worth validating the system memory usage without the scan operation too.

mgkiller7 commented 4 years ago

Dear @ajaykathat

I am xiebluelike's teammate. Recently with slabtop i find that the main memory leak is from kmalloc-512 . so i run a test.sh and then turn on /sys/kernel/slab/kmalloc-512/trace and save the trace log. 15 mintes later i analyze the log and find there is a very high probability(142-not-free / 263--total-malloc) of memory leak in cfg_scan_result()->cfg80211_inform_bss_frame()->__kmalloc(), because i can't find its free operate, see below:

(memory leak in 15 mintes is 142x512=71KB, almost match the leak 4M per 13 hours in the test early ) [ 773.390000] TRACE kmalloc-512 alloc 0xc1e61400 inuse=8 fp=0x (null) [ 773.390000] CPU: 0 PID: 6 Comm: kworker/u2:0 Tainted: G WC O 3.10.108 #40 [ 773.390000] Workqueue: WILC_wq handle_rcvd_ntwrk_info [wilc_sdio] [ 773.390000] Backtrace: [ 773.390000] [] (dump_backtrace+0x0/0x104) from [] (show_stack+0x18/0x1c) [ 773.390000] r7:c03219b0 r6:c1e61400 r5:c3801c00 r4:c0b32c20 [ 773.390000] [] (show_stack+0x0/0x1c) from [] (dump_stack+0x20/0x28) [ 773.390000] [] (dump_stack+0x0/0x28) from [] (alloc_debug_processing+0xfc/0x17c) [ 773.390000] [] (alloc_debug_processing+0x0/0x17c) from [] (slab_alloc.constprop.13+0x538/0x598) [ 773.390000] r7:c0b32c20 r6:c1e61400 r5:c3801c00 r4:c0b833c0 [ 773.390000] [] (__slab_alloc.constprop.13+0x0/0x598) from [] (kmalloc+0x9c/0x154) [ 773.390000] [] (__kmalloc+0x0/0x154) from [] (cfg80211_inform_bss_frame+0x98/0x178) [ 773.390000] [] (cfg80211_inform_bss_frame+0x0/0x178) from [] (cfg_scan_result+0xb4/0x180 [wilc_sdio]) [ 773.390000] [] (cfg_scan_result+0x0/0x180 [wilc_sdio]) from [] (handle_rcvd_ntwrk_info+0xb0/0x19c [wilc_sdio]) [ 773.390000] r5:000001b1 r4:c1eb1edc [ 773.390000] [] (handle_rcvd_ntwrk_info+0x0/0x19c [wilc_sdio]) from [] (process_one_work+0x200/0x350) [ 773.390000] r9:c3868000 r8:c1e17200 r7:00000000 r6:c3850600 r5:c1eb1edc r4:c384f300 [ 773.390000] [] (process_one_work+0x0/0x350) from [] (worker_thread+0x1fc/0x3a0) [ 773.390000] [] (worker_thread+0x0/0x3a0) from [] (kthread+0xb4/0xc0) [ 773.390000] [] (kthread+0x0/0xc0) from [] (ret_from_fork+0x14/0x3c) [ 773.390000] r7:00000000 r6:00000000 r5:c0038ef0 r4:c3857dac

Also kmalloc-256 leak memory too, which also mainly from cfg80211_inform_bss_frame, which leak 415-not-free / 2175--total-malloc in 50 mintes, see below:

[28697.450000] TRACE kmalloc-256 alloc 0xc1ef3100 inuse=16 fp=0x (null)^M [28697.450000] CPU: 0 PID: 19762 Comm: kworker/u2:0 Tainted: G WC O 3.10.108 #40^M [28697.450000] Workqueue: WILC_wq handle_rcvd_ntwrk_info [wilc_sdio]^M [28697.450000] Backtrace: ^M [28697.450000] [] (dump_backtrace+0x0/0x104) from [] (show_stack+0x18/0x1c)^M [28697.450000] r7:c03219b0 r6:c1ef3100 r5:c3801c80 r4:c0b33e60^M [28697.450000] [] (show_stack+0x0/0x1c) from [] (dump_stack+0x20/0x28)^M [28697.450000] [] (dump_stack+0x0/0x28) from [] (alloc_debug_processing+0xfc/0x17c)^M [28697.450000] [] (alloc_debug_processing+0x0/0x17c) from [] (slab_alloc.constprop.13+0x538/0x598)^M [28697.450000] r7:c0b33e60 r6:c1ef3100 r5:c3801c80 r4:c0b83348^M [28697.450000] [] (__slab_alloc.constprop.13+0x0/0x598) from [] (kmalloc+0x9c/0x154)^M [28697.450000] [] (__kmalloc+0x0/0x154) from [] (cfg80211_inform_bss_frame+0x98/0x178)^M [28697.450000] [] (cfg80211_inform_bss_frame+0x0/0x178) from [] (cfg_scan_result+0xb4/0x180 [wilc_sdio])^M [28697.450000] [] (cfg_scan_result+0x0/0x180 [wilc_sdio]) from [] (handle_rcvd_ntwrk_info+0xb0/0x19c [wilc_sdio])^M [28697.450000] r5:000000cb r4:c1db279c^M [28697.450000] [] (handle_rcvd_ntwrk_info+0x0/0x19c [wilc_sdio]) from [] (process_one_work+0x200/0x350)^M [28697.450000] r9:c1e9c000 r8:c1cb4000 r7:00000000 r6:c3850600 r5:c1db279c^M r4:c1edd960^M [28697.450000] [] (process_one_work+0x0/0x350) from [] (worker_thread+0x1fc/0x3a0)^M [28697.450000] [] (worker_thread+0x0/0x3a0) from [] (kthread+0xb4/0xc0)^M [28697.450000] [] (kthread+0x0/0xc0) from [] (ret_from_fork+0x14/0x3c)^M [28697.450000] r7:00000000 r6:00000000 r5:c0038ef0 r4:c3879e7c^M

so i think this found may help for you to deal with wilc driver memory leak issue.

Best Regard, mgkiller

ajaykathat commented 4 years ago

Actually the memory allocated by cfg80211_inform_bss_frame() is maintained by cfg80211 subsystem. cfg8011 uses a reference count mechanism to keep track of allocated memory. There is no direct free for that allocated memory. cfg80211_put_bss() call is used to decrement the reference count which is already existed in the driver. Cfg80211 subsystem free the allocated memory based on an aging algorithm which invalidates the entire and frees the expired entries on the fly. The allocated memory may not be freed immediately but should be gradual.

mgkiller7 commented 4 years ago

Thank you for reply @ajaykathat Actually i am sure the memory leak from cfg80211_inform_bss_frame() at least 50 mintes not be freed, is that normal? enven thougt i think this leak can not be freed at all. AS you say those allocated memory are freed so gradually , any advise when use wilc module in a MCU with only 64MB internal DRAM?

ajaykathat commented 4 years ago

I don't think cfg80211_inform_bss_frame() will take more then 50mins to free a single BSS entry if that network is not found in a continuous scan. cfg80211_inform_bss_frame() is used to buildup the scan results based on the received beacons and probe response information during scan. For every scan, if a new network is found it will add an entry otherwise update an existing entry. It is a very common code which is used by all the cfg80211 based driver. The best approach to identify the time taken by an entry to be removed is using iw commands

after loading the module try these commands $ ifconfig wlan0 up $ iw wlan0 scan $ iw wlan0 scan dump -u

Giving "iw wlan0 scan dump -u" command again after some time will not give the results when the entries are expired.