nkolban / esp32-snippets

Sample ESP32 snippets and code fragments
https://leanpub.com/kolban-ESP32
Apache License 2.0
2.36k stars 710 forks source link

BLE crashes while scanning MACs in a dense environment #464

Open cyberman54 opened 6 years ago

cyberman54 commented 6 years ago

I'm using the code pasted below on an ESP-32 board to scan & hash Bluetooth MACs. This seems to work fine, as long as there are not too many bluetooth devices in range. But in a dense environment with a few hundred Bluetooth devices (beacons) in range the code crashes and the ESP32-device resets. During reset the device gives only a backtrace message on the serial console, this makes it difficult to see the root cause which triggers the crash.

I inserted some logging in my code. This way i can see that the crash happenes in the below pasted routine.

I'm lost at this point, any hints how i can debug this further?

class MyAdvertisedDeviceCallbacks: public BLEAdvertisedDeviceCallbacks {
    void onResult(BLEAdvertisedDevice advertisedDevice) {
        int lastcount = (int) macs.size();
        uint8_t *p = (uint8_t *) advertisedDevice.getAddress().getNative();

        // Current devices seen on this scan session
        currentScanDevice++;
        // add this device and refresh display if it was not previously added
        if ( mac_add(p, advertisedDevice.getRSSI(), MAC_SNIFF_BLE) ) {
            char buff[16];
            snprintf(buff, sizeof(buff), "PAX:%d", (int) macs.size()); // convert 16-bit MAC counter to decimal counter value
        }
    }
};
void BLECount() {
    ESP_LOGI(TAG, "BLE scan started");
    currentScanDevice = 0; 
    BLEDevice::init(PROGNAME);
    BLEScan* pBLEScan = BLEDevice::getScan(); 
    pBLEScan->setAdvertisedDeviceCallbacks(new MyAdvertisedDeviceCallbacks());
    pBLEScan->setActiveScan(true);
    BLEScanResults foundDevices = pBLEScan->start(cfg.blescantime);
    int blenum=foundDevices.getCount();
    ESP_LOGI(TAG, "BLE scan done, seen %d device(s)", blenum);
}
chegewara commented 6 years ago

You can use gdb to check backtrace log. This can show you where code is broken.

cyberman54 commented 6 years ago

I read from gdb, but unfortunately i'm not using the ESP-IDF, but Arduino environment on Visual Studio Code with PlatformIO. I did not find any tool for PlatformIO to debug with gdb.

chegewara commented 6 years ago

With arduino you can use this tool. I am using it even when i have bin created and compiled with esp-idf: https://github.com/me-no-dev/EspExceptionDecoder

cyberman54 commented 6 years ago

Thanks for the link, but i am not on Arduino IDE, i am using PlatformIO. But okay, i see can upload the .elf from outside Arduino IDE. Will try it.

cyberman54 commented 6 years ago

I wrote a log of the exception and decoded it, see below. Looks like the crash happenes in BLEscan.h?

abort() was called at PC 0x40162673 on core 0

Backtrace: 0x4008af28:0x3ffe2e70 0x4008b027:0x3ffe2e90 0x40162673:0x3ffe2eb0 0x401626ba:0x3ffe2ed0 0x401501fb:0x3ffe2ef0 0x40150422:0x3ffe2f10 0x400d68d5:0x3ffe2f30 0x400d6a19:0x3ffe2f50 0x400d4451:0x3ffe2f90 0x400d4c44:0x3ffe2fe0

Rebooting...

Decoding stack results
0x4008af28: invoke_abort at .../ESP32/esp-idf-public/components/esp32/./panic.c line 139
0x4008b027: abort at .../ESP32/esp-idf-public/components/esp32/./panic.c line 148
0x40162673: __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 47
0x401626ba: std::terminate() at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_terminate.cc line 57
0x401501fb: __cxxabiv1::__cxa_throw(void*, std::type_info*, void (*)(void*)) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/eh_throw.cc line 87
0x40150422: operator new(unsigned int) at /builds/idf/crosstool-NG/.build/src/gcc-5.2.0/libstdc++-v3/libsupc++/new_op.cc line 54
0x400d68d5: std::vector   >::vector(std::vector   > const&) at ...\.platformio\packages\toolchain-xtensa32\xtensa-esp32-elf\include\c++\5.2.0\ext/new_allocator.h line 104
0x400d6a19: BLEScan::start(unsigned int) at .piolibdeps\ESP32 BLE Arduino_ID1841\src/BLEScan.h line 32
0x400d4451: BLECount() at src\macsniff.cpp line 150
0x400d4c44: sniffer_loop(void*) at src\main.cpp line 343
cyberman54 commented 6 years ago

Line 32 in BLEscan.h is class BLEScanResults. Not sure if the stack trace means, that the crash happenes in BLEScanResults, or in the main app which grabs the result of BLEScanResults?

BLEScanResults foundDevices = pBLEScan->start(cfg.blescantime); // note: this is a blocking call

class BLEScanResults {
public:
    void                dump();
    int                 getCount();
    BLEAdvertisedDevice getDevice(uint32_t i);

private:
    friend BLEScan;
    std::vector<BLEAdvertisedDevice> m_vectorAdvertisedDevices;
};
chegewara commented 6 years ago

Whats the code here: sniffer_loop(void*) at src\main.cpp line 343

and how many devices it can scan before crash? Maybe its problem with stack size or memory. You can also try to increase scan interval to 512 or 625: pScan->setInterval(625);

cyberman54 commented 6 years ago

src\main.cpp line 343 is simply a call to a function:

BLECount();

void BLECount() {
    ESP_LOGI(TAG, "BLE scan started");
    currentScanDevice = 0; // Set 0 seen device on this scan session
    u8x8.drawString(0,3,"Scanning->");
    BLEDevice::init(""); // we don't want to be seen by a name
    BLEScanResults foundDevices; // instance for getting count
    BLEScan* pBLEScan = BLEDevice::getScan(); //create new scan
    pBLEScan->setAdvertisedDeviceCallbacks(new MyAdvertisedDeviceCallbacks());
    pBLEScan->setActiveScan(false); // An active scan would mean that we will wish unneeeded scan responses
    pBLEScan->setWindow(BLESCANWINDOW);
    pBLEScan->setInterval(BLESCANINTERVAL);
    pBLEScan->start(cfg.blescantime); // note: this is a blocking call
    ESP_LOGI(TAG, "BLE scan done, seen %d device(s)", foundDevices.getCount());
}
#endif

BLEcount() crashes in the line

pBLEScan->start(cfg.blescantime); // note: this is a blocking call

after scanning 250 - 500 devices.

cyberman54 commented 6 years ago

ScanInterval currently is justified to 10 Milliseconds. I did try with 100 Milliseconds, makes no difference. But this is as expected. The interval is important if the device has to manage bluetooth connections between scans (-> duty cycle). But in my use case it does not connect to any other bluetooth devices, it is just scanning.

chegewara commented 6 years ago

In this case i can think two possible causes:

Sorry i cant help you more with that, because im not bluetooth pro and i dont have similar environment to recreate this issue.

saknarak commented 4 years ago

I faced same problem too.

when in dense environment, use callback to report one by one instead of whole at once. and also not to keep in vector to check duplicate

here is my solution

https://github.com/saknarak/arduino-esp32/commit/38461c02c57f16aed82edf8a1847b1824ce89e56

  pBLEScan = BLEDevice::getScan();
  pBLEScan->setAdvertisedDeviceCallbacks(new MyAdvertisedDeviceCallbacks(), true /* wantDuplicate*/)