yubox-node-org / AsyncTCPSock

Reimplementation of the API of me-no-dev/AsyncTCP using high-level BSD sockets
GNU Lesser General Public License v3.0
20 stars 9 forks source link

InstrFetchProhibited: AsyncClient::_close() -> _removeAllCallbacks #4

Open DOSpt opened 3 years ago

DOSpt commented 3 years ago

Hi

I've started using this lib due to the locking problems with the original asyncTCP.

Last week while doing some tests I got this error on 2 different projects. It seems that the pointer to _removeAllCallbacks was cleared. The only way I see this happening is by the destructor being called, although it seems very strange. I was trying to replicate the problem but it doesn't seem to be easy to force it to happen. Regarding heap I know that I'm ok. Just before it happened it was like this: Heap: Free:157588, Min:117616, Size:340244, Alloc:73304.

I believe we can increase the scope of the _asyncsock_mutex until the end of the close method. Other thing that might be happening is that the program counter is pointing to the next instruction after the problem. In this case it could be while doing this (if (_discard_cb) _discard_cb(_discard_cb_arg, this);). If the _error method is called right after checking if the pointer is null it will clear this pointer and generate the error. Either way increasing the scope of the _asyncsock_mutex should solve the problem.

This happened immediately after a simple post with less than 150bytes.

Guru Meditation Error: Core  0 panic'ed (InstrFetchProhibited). Exception was unhandled.
Core 0 register dump:
PC      : 0x11ef03e1  PS      : 0x00060130  A0      : 0x800e970f  A1      : 0x3ffd8790  
A2      : 0x3ffdf258  A3      : 0x00000000  A4      : 0x3ffdf258  A5      : 0x00000001  
A6      : 0x00060720  A7      : 0x00000000  A8      : 0x800e8d94  A9      : 0x3ffd8770  
A10     : 0x3ffdf2b8  A11     : 0x3ffdf2b8  A12     : 0x00000003  A13     : 0x00000000  
A14     : 0x00000000  A15     : 0x00000000  SAR     : 0x0000000a  EXCCAUSE: 0x00000014  
EXCVADDR: 0x11ef03e0  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  

ELF file SHA256: 0000000000000000

Backtrace: 0x11ef03e1:0x3ffd8790 0x400e970c:0x3ffd87b0 0x400e9b73:0x3ffd87d0 0x400e95ee:0x3ffd8800 0x4008a332:0x3ffd8860
PC: 0x11ef03e1
EXCVADDR: 0x11ef03e0

Decoding stack results
0x400e970c: AsyncClient::_close() at C:\Users\Daniel\Documents\Arduino\libraries\AsyncTCP\src\AsyncTCP.cpp line 744
0x400e9b73: AsyncClient::_sockPoll() at C:\Users\Daniel\Documents\Arduino\libraries\AsyncTCP\src\AsyncTCP.cpp line 706
0x400e95ee: _asynctcpsock_task(void*) at C:\Users\Daniel\Documents\Arduino\libraries\AsyncTCP\src\AsyncTCP.cpp line 206
0x4008a332: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 143
DOSpt commented 3 years ago

I've just increased the scope and added an extra validation that _socket is still -1 and got the following error after some minutes with 2 PCs updating values every 300ms.

Guru Meditation Error: Core  0 panic'ed (InstrFetchProhibited). Exception was unhandled.
Core 0 register dump:
PC      : 0x00070000  PS      : 0x00060330  A0      : 0x800e8dcc  A1      : 0x3ffd8490  
A2      : 0x3ffde3e4  A3      : 0x3ffd84ac  A4      : 0x3ffd84a8  A5      : 0x00000000  
A6      : 0x00000000  A7      : 0x000001a3  A8      : 0x8017fdc4  A9      : 0x3ffd8460  
A10     : 0x3ffde3e4  A11     : 0x3ffde3e4  A12     : 0x00000003  A13     : 0x00000001  
A14     : 0x00060120  A15     : 0x00000000  SAR     : 0x0000000a  EXCCAUSE: 0x00000014  
EXCVADDR: 0x00070000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  

ELF file SHA256: 0000000000000000

Backtrace: 0x00070000:0x3ffd8490 0x400e8dc9:0x3ffd84b0 0x400e9797:0x3ffd84d0 0x400e97ae:0x3ffd84f0 0x400e3276:0x3ffd8510 0x400e353e:0x3ffd8530 0x400e3561:0x3ffd8550 0x400e9222:0x3ffd8570 0x400e9c1e:0x3ffd85a0 0x400e967e:0x3ffd85d0 0x4008a332:0x3ffd8630
PC: 0x00070000
EXCVADDR: 0x00070000

Decoding stack results
0x400e8dc9: AsyncClient::_removeAllCallbacks() at C:\Users\Daniel\Documents\Arduino\libraries\AsyncTCP\src\AsyncTCP.cpp line 717
0x400e9797: AsyncClient::_close() at C:\Users\Daniel\Documents\Arduino\libraries\AsyncTCP\src\AsyncTCP.cpp line 745
0x400e97ae: AsyncClient::close(bool) at C:\Users\Daniel\Documents\Arduino\libraries\AsyncTCP\src\AsyncTCP.cpp line 863
0x400e3276: AsyncWebServerRequest::_onAck(unsigned int, unsigned int) at C:\Users\Daniel\Documents\Arduino\libraries\ESPAsyncWebServer\src\WebRequest.cpp line 224
0x400e353e: AsyncWebServerRequest::_onPoll() at C:\Users\Daniel\Documents\Arduino\libraries\ESPAsyncWebServer\src\WebRequest.cpp line 203
0x400e3561: std::_Function_handler   >::_M_invoke(const std::_Any_data &,  ,  ) at C:\Users\Daniel\Documents\Arduino\libraries\ESPAsyncWebServer\src\WebRequest.cpp line 73
0x400e9222: std::function ::operator()(void*, AsyncClient*) const at c:\users\daniel\appdata\local\arduino15\packages\esp32\tools\xtensa-esp32-elf-gcc\1.22.0-97-gc752ad5-5.2.0\xtensa-esp32-elf\include\c++\5.2.0/functional line 2271
0x400e9c1e: AsyncClient::_sockPoll() at C:\Users\Daniel\Documents\Arduino\libraries\AsyncTCP\src\AsyncTCP.cpp line 711
0x400e967e: _asynctcpsock_task(void*) at C:\Users\Daniel\Documents\Arduino\libraries\AsyncTCP\src\AsyncTCP.cpp line 206
0x4008a332: vPortTaskWrapper at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/freertos/port.c line 143

Changes:

void AsyncClient::_close(void)
{
    //Serial.print("AsyncClient::_close: "); Serial.println(_socket);
    xSemaphoreTakeRecursive(_asyncsock_mutex, (TickType_t)portMAX_DELAY);
    if(_socket != -1)
    {
        _conn_state = 0;
        lwip_close_r(_socket);
        _socket = -1;

        _clearWriteQueue();
        if (_discard_cb) _discard_cb(_discard_cb_arg, this);
        _removeAllCallbacks();
    }
    xSemaphoreGiveRecursive(_asyncsock_mutex);
}

void AsyncClient::_error(int8_t err)
{
    xSemaphoreTakeRecursive(_asyncsock_mutex, (TickType_t)portMAX_DELAY);
    if(_socket != -1)
    {
        _conn_state = 0;
        lwip_close_r(_socket);
        _socket = -1;

        _clearWriteQueue();
        if (_error_cb) _error_cb(_error_cb_arg, this, err);
        if (_discard_cb) _discard_cb(_discard_cb_arg, this);
        _removeAllCallbacks();
    }
    xSemaphoreGiveRecursive(_asyncsock_mutex);
}

void AsyncServer::end()
{
    xSemaphoreTakeRecursive(_asyncsock_mutex, (TickType_t)portMAX_DELAY);
    if (_socket != -1)
    {
        lwip_close_r(_socket);
        _socket = -1;
    }
    xSemaphoreGiveRecursive(_asyncsock_mutex);
}

Exception:

void AsyncClient::_removeAllCallbacks(void)
{
    _connect_cb = NULL;  \\  <-- line 717
    _connect_cb_arg = NULL;
    _discard_cb = NULL;
    _discard_cb_arg = NULL;
    _sent_cb = NULL;
    _sent_cb_arg = NULL;
    _error_cb = NULL;
    _error_cb_arg = NULL;
    _recv_cb = NULL;
    _recv_cb_arg = NULL;
    _timeout_cb = NULL;
    _timeout_cb_arg = NULL;
    _poll_cb = NULL;
    _poll_cb_arg = NULL;
}
avillacis commented 3 years ago

I see you are using the ESPAsyncWebServer library. Which fork and branch are you using? I do all my tests with https://github.com/yubox-node-org/ESPAsyncWebServer/tree/yuboxfixes-0xFEEDC0DE64-cleanup because the base ESPAsyncWebServer had many locking issues that were analyzed and fixed by @0xFEEDC0DE64 , and which might be confused with issues in AsyncTCPSock.

DOSpt commented 3 years ago

I see you are using the ESPAsyncWebServer library. Which fork and branch are you using? I do all my tests with https://github.com/yubox-node-org/ESPAsyncWebServer/tree/yuboxfixes-0xFEEDC0DE64-cleanup because the base ESPAsyncWebServer had many locking issues that were analyzed and fixed by @0xFEEDC0DE64 , and which might be confused with issues in AsyncTCPSock.

I'm using that fork and for the filesystem I'm using LittleFS

avillacis commented 3 years ago

Do you have a minimal reproducing case, and a clear set of instructions to reproduce this issue? As in, a sketch that sets up just enough to trigger the bug, and the instructions, such as "repeatedly fetch URL such and such until failure"?

avillacis commented 3 years ago

BTW, as of commit fea600a56cac713f19d495528cf8a25197d8f674 installed callbacks are never removed under any condition other than straight-up destruction of the object. You should test whether this fixes your bug.