nkolban / esp32-snippets

Sample ESP32 snippets and code fragments
https://leanpub.com/kolban-ESP32
Apache License 2.0
2.35k stars 710 forks source link

WebSocket.send() and Socket.send() have no return value #344

Closed squonk11 closed 6 years ago

squonk11 commented 6 years ago

Currently I am working on my application which uses WebSocket communication. Here I am sending several hundred WebSocket telegrams to the browser in very short time. This works well when logging is enabled but the Socket connection gets reset when I switch logging off. I assume that I am pushing data too fast into the Socket so that it crashes. In order to overcome this situation I would like to check if the Socket is free for new data or to check if the previous send was successfull. For this I could check the return value of WebSocket.send() or Socket.send(). But these functions are currently defined as void.

chegewara commented 6 years ago

Can you try make menuconfig->Compiler options->Optimization Level->Release ?

squonk11 commented 6 years ago

With Optimization Level Release it seems to work. But also in this case I see on the logging console sometimes an error message:

E (73536) Socket: send: socket=4099, Connection reset by peer

Do I need to bother about that message?

Edit: after experimenting a little bit I see that the connection is not so stable. Sometimes I get the error message:

E (181231) Socket: accept(): Too many open files in system, m_sock=4096 E (181236) HttpServerTask: Caught an exception waiting for new client!

after that the Webserver does not respond any more.

nkolban commented 6 years ago

I had a look at the source ... when we send data down a socket, we call Socket::send() which appears to trap errors and (at a minimum) logs them to the trace. See:

https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/Socket.cpp#L432

For a WebSocket send, not sure what to do if a send() fails. We get a callback notification when the partner socket is closed or otherwise detects an error and I can't yet imagine a scenario where we would attempt to perform a send() and not get a partner error indication if the send failed.

The ESP32 can only maintain a small number of concurrently open TCP/IP connections. I think 4 is the default. Can we check to see if we are creating new incoming WebSocket connections and not seeing the corresponding completions of them?

squonk11 commented 6 years ago

actively (in my code) I am initiating a WebSocket connection only once. This one I am using for the communication. I am not shure if some security layer in the lower level TCP/IP stack could initiate additional WebSocket connections; e.g. if there is something wrong with the ongoing communication. Is it possible to track this using wireshark?

chegewara commented 6 years ago

E (181231) Socket: accept(): Too many open files in system, m_sock=4096

This message you can see if you are trying to connect from the same client too quick, before old request got processed and socket closed.

squonk11 commented 6 years ago

yes, before opening the WebSocket connection I initiate several (20?) requests via REST within short time (simply a for loop in the browser). Can this be the reason for the error message? How can I prevent this? Is there a possibility in Javascript to detect if the (ESP32-)server has problems with too many requests within short time?

chegewara commented 6 years ago

@squonk11 Like mr @nkolban said, by default we have set only 4 open connections allowed. You can try to change this value in menuconfig:

CONFIG_LWIP_MAX_SOCKETS:                                                                                               
  │                                                                                                                      
  │ Sockets take up a certain amount of memory, and allowing fewer                                                       
  │ sockets to be open at the same time conserves memory. Specify                                                          
  │ the maximum amount of sockets here. The valid value is from 1                                                          
  │ to 16.                                                                                                               
  │                                                                                                                      
  │ Symbol: LWIP_MAX_SOCKETS [=4]                                                                                        
  │ Type  : integer                                                                                                      
  │ Range : [1 32]                                                                                                       
  │ Prompt: Max number of open sockets                                                                                   
  │   Location:                                                                                                          
  │     -> Component config                                                                                              
  │       -> LWIP                                                                                                        
  │   Defined at /home/esp32/esp/esp-idf/components/lwip/Kconfig:19 
squonk11 commented 6 years ago

I will check this tomorrow.

squonk11 commented 6 years ago

I now increased the number of allowed connections from 4 to 8 and I reduced the number of consecutive requests. By this the error message

E (181231) Socket: accept(): Too many open files in system, m_sock=4096

seems to be dissapeared. But the error messages:

E (73536) Socket: send: socket=4099, Connection reset by peer

still persist and I am still having stability problems with the communication. Am I sending data too fast? Actually I am sending chunks of data of approx. size of 500Bytes every 40ms via WebSocket. This is a datarate of 12,5kByte/s only...

chegewara commented 6 years ago

My knowledge about sockets and wifi communication is rather basicaly, but it seems that for some reason client app is disconnecting or reseting connection. You may have issue with wifi Rx buffer. Check options in menuconfig regarding dynamic and static Rx buffer in wifi submenu. Its best lead i can tell you.

squonk11 commented 6 years ago

o.k. thanks. I will check that. I now reduced the stringsize down to 100 --> this works. If I increase to 150 --> does not work. When it does not work, Microsoft Edge directly detects a terminated WebSocket connection and triggers the websocket.onclose function and no further Websocket telegrams are recorded in wireshark. Firefox does not detect this condition, all telegrams are transmitted but wireshark marks them as TCP and not as WebSocket.

nkolban commented 6 years ago

The way I would suggest we approach this puzzle is to write a detailed story that describes what is going on ... including topology diagrams and more. We can then shop this around as needed.

Lets see if I can get a handle on what I think you have and we'll see how close that might be. You have an ESP32 that is running the HttpServer classes (thank you for that and we'll try and give you the best support we can on using those). This then makes your ESP32 an HTTP server. Your ESP32 is presumably connecting to your home access point and as such, it has an IP address. Given that your ESP32 is being an HTTP Server, we now have a unique endpoint ... the IP address of the ESP32 and the port on which the HTTP server is listening.

Now comes some unknowns. You mention a browser ... that appears to be the client of the ESP32 hosted HTTP server. However, where is the browser running? Is it on a PC or a phone. In either case, please describe a little about the environment. If a PC, is it windows, mac or Linux? What flavor of browser is being used?

I am now suspecting that your browser reaches out to the ESP32 hosted HTTP server and forms a connection and asks for a WebSocket. At this point there is now a "pipe" between the ESP32 and the browser. Can I assume that the pipe is ESP32 to browser exclusive?

I now hear that you are sending messages of about 500bytes each 40ms. Now we get into the realm of some tests. If you reduce the frequency of ESP32 transmissions ... say to 1 a second ... can you maintain stability? If yes and you then increase the rate to 2 a second ... do we still have stability? How fast can we go before we reach trouble?

When we reach trouble ... now we have a test to work against. Is the problem consistent based on different browser types?

If we run wireshark ... can see any "interesting" traffic when a connection fails?

chegewara commented 6 years ago

@squonk11 This is consistent with my previous post. In menuconfig you can read that can happen so wifi Rx buffers are too large and application buffer may be too small to procced all packets in time. Its long and hard way to make it right but at the end you will be happy when it works.

squonk11 commented 6 years ago

In menuconfig in the wifi configuration I only see parameters for the number of buffers - not for the size of them. Do I need to increase the nuber of Rx buffers? Why Rx buffers - I think I might have the problem on the Tx buffers because the connection gets lost when I am sending?

chegewara commented 6 years ago

@squonk11 Ok then, i thought its issue with incomming messages. So you need to change Tx buffer.

Set the number of WiFi dynamic TX buffers. The size of each dynamic TX buffer is not fixed,                          │  
  │ it depends on the size of each transmitted data frame.                                                               │  
  │                                                                                                                      │  
  │ For each transmitted frame from the higher layer TCP/IP stack, the WiFi driver makes a copy                          │  
  │ of it in a TX buffer. For some applications, especially UDP applications, the upper layer                            │  
  │ can deliver frames faster than WiFi layer can transmit. In these cases, we may run out of TX  buffers.

Or you can try to change Tx buffer to static since you have found that data size is causing issues. Static buffer is approx. 1.6kB size so you need to decide if its enough for one packet. Then you can fiddle with number of buffers(up to 64):

Set the number of WiFi static TX buffers. Each buffer takes approximately 1.6KB of RAM.                              │  
  │ The static RX buffers are allocated when esp_wifi_init() is called, they are not released                            │  
  │ until esp_wifi_deinit() is called.                                                                                   │  
  │                                                                                                                      │  
  │ For each transmitted data frame from the higher layer TCP/IP stack, the WiFi driver makes a                          │  
  │ copy of it in a TX buffer.  For some applications especially UDP applications, the upper                             │  
  │ layer can deliver frames faster than WiFi layer can transmit. In these cases, we may run out                         │  
  │ of TX buffers.  
squonk11 commented 6 years ago

After switching to static Tx buffers with more buffers than default the WebServer does not work at all. Maybe I have also an issue with stack size; CPU crashes in "new_op.cc:54". I will do some changes here, observe the results and post them here if necessary.

squonk11 commented 6 years ago

Do you know if there is a limit concerning the transmission of Strings via WebSocket protocol using the SendType: WebSocket::SEND_TYPE_TEXT; e.g. to max 127 char?

squonk11 commented 6 years ago

New finding: A transmission of strings with a length of 127 characters [0...126] work perfectly. A transmission of strings with a length of 128 (or more) does not work. Strange! I am transmitting a buffer defined as char data[500]; using the command ws.send(data, WebSocket::SEND_TYPE_TEXT);. The first parameter to send is of type std:string; might there be a problem in the implicit conversion from char to std::string?

squonk11 commented 6 years ago

New finding: If I transmit a std::string of length 126: it works If I transmit a std:string of length 127: it does not work So it is not related to the fact if it is char array or std::string passed to ws.send

squonk11 commented 6 years ago

New finding: I now reduced the frequency of telegram transmissions from 40ms in the beginning down to 1second. The result is the same: 126 characters work 127 characters don't work. Any suggestions what else I should test?

squonk11 commented 6 years ago

I think I found the problem: the byte-order for the length information in the WebSocket header in case of >126bytes seems to be wrong. I modified in WebSocket::send the code as follows:

if (data.length() < 126) { frame.len = data.length(); m_socket.send((uint8_t )&frame, sizeof(frame)); } else { frame.len = 126; m_socket.send((uint8_t )&frame, sizeof(frame)); uint16_t len = (uint16_t)data.length(); len = ((len & 0xff) << 8) | (len >> 8); //m_socket.send((uint16_t)data.length()); m_socket.send(len);

So, basically I swapped the bytes of the 16bit length information (maybe in an archaic way...) With this modification it works perfectly.

squonk11 commented 6 years ago

After doing some more tests it seems as if this was the problem. I think I deserve a beer!

chegewara commented 6 years ago

maybe even 2. nice work

nkolban commented 6 years ago

No kidding mr @squonk11 ... you most assuredly rock. That was fantastic detective work. What it means though is that not a soul before you has ever tried to send a packet of web socket data larger than 126 bytes using this library. That means that we must consider this library "unproven".

Because I like tinkering, this library was written from the ground up rather than attempting to leverage any other pre-existing applications. That's normally a "bad" thing because it means that we "recreate" the bugs that others may have solved.

The web socket protocol is documented in this specification:

https://tools.ietf.org/html/rfc6455

If we look at section 5.2 we find the wire level layout:

https://tools.ietf.org/html/rfc6455#section-5.2

Basically, if the length of data that can be sent is < 126 then the data is 7 bits of the length (because < 126 will fit in 7 bits). If the data length is > 126 but less than < 65536 the 7 bits of length are set to be 126 (as a flag) followed by 16bits of length followed by data. There is the ability to send data of length 64 bits .... but I haven't gotten around to implementing that as I haven't seen any example anywhere of sending a message > 64K.

Where I dismally failed was to convert the length from "host" bit order (little endian on an ESP32) to network bit order (big endian).

The code base has been changed to resolve that now just as you did in your code. However, I made one small change. I am using the "htons" function which will byte flip but ONLY if needed. If there was ever an ESP33 that used big endian format (doubt it ... but you never know) then the code would break.

Again ... let me echo what mr @chegewara said ... VERY nice work ... and beer and kudos to yourself.

squonk11 commented 6 years ago

acutally that is what I did: studied the framing of the WebSocket telegrams from the rfc6455 and compared the examples in section 5.7 (especially the example with 256 bytes binary message in a single unmasked frame) and compared to the wireshark trace.

chegewara commented 6 years ago

Thats is example of good debuging with use of tools like wireshark. That proves skills.

squonk11 commented 6 years ago

Unfortunately I am now having another problem: After sending all the above mentioned data from the ESP to the browser, I try to initiate this request again by hitting the same button in the website again. I see on wireshark that the request gets sent to the ESP32 and gets acknowledged ([ACK]-telegram) by the lwip stack. BUT: the onMessage handler of the WebSocketHandler class does not get triggered. It seems as if the telegram is being ignored inside the ESP32 SW somehow - there is absolutely no logging on the serial console. Firefox after some time gives in the console an error message that the websocket connection was interrupted during the loading of the page. Microsoft Edge just closes the WebSocket connection after several minutes. To be honest: I have no idea where I should start to search.

nkolban commented 6 years ago

Do we want to consider this a new puzzle and hence a new issue? We seem to have moved away from send() having no return value.

chegewara commented 6 years ago

I might be wrong, but i think we need to fix latest bugfix here:

The payload length is the length of the "Extension data" + the length of the "Application data". The length of the "Extension data" may be zero, in which case the payload length is the length of the "Application data".

https://tools.ietf.org/html/rfc6455#section-5.2

Since we are considering now option when Extension data is 2 bytes long, then dont we need to add 2 to data.length() in this line? https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/WebSocket.cpp#L342 and here: https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/WebSocket.cpp#L340

squonk11 commented 6 years ago

I do not think that we have extension data here:

The "Extension data" is 0 bytes unless an extension has been negotiated. Any extension MUST specify the length of the "Extension data", or how that length may be calculated, and how the extension use MUST be negotiated during the opening handshake. If present, the "Extension data" is included in the total payload length.

Or do we negotiate the presence of extension data during the opening phase? Just for a test I increased the value for the data length by 2; the result is that the transmission of the first telegram >126bytes fails.

chegewara commented 6 years ago

I might read something wrong and mixed 'Extended data lenght' with 'extension data'. Sorry.

squonk11 commented 6 years ago

I am also observing that sometimes during the transmission of my data to the browser and sometimes after the transmissin when I click the second time the button to transmit the data a second time I get a strange behaviour. The log shows:

D (20081) HttpParser: << parse D (20081) HttpRequest: Method: , URL: "", Version: D (20081) HttpRequest: Body: "" D (20081) HttpServerTask: >> processRequest: Method: , Path: D (20082) HttpServerTask: No Path handler found D (20084) HttpServer: Path /sdcard/http is a directory D (20084) HttpResponse: >> sendData D (20084) Socket: send: Binary of length: 36 D (20084) Socket: send: Raw binary of length: 36 D (20086) Socket: send: Binary of length: 12 D (20086) Socket: send: Raw binary of length: 12 D (20087) HttpResponse: << sendData D (20087) HttpResponse: >> sendData D (20087) Socket: send: Binary of length: 17 D (20088) Socket: send: Raw binary of length: 17 D (20088) HttpResponse: << sendData D (20089) HttpResponse: >> sendData D (20089) Socket: send: Binary of length: 13 D (20090) Socket: send: Raw binary of length: 13 E (20090) Socket: send: socket=4102, Connection reset by peer D (20091) HttpResponse: << sendData D (20092) HttpResponse: >> sendData D (20092) Socket: send: Binary of length: 21 D (20093) Socket: send: Raw binary of length: 21 E (20093) Socket: send: socket=4102, Connection reset by peer

It looks like the ESP is receiving an "empty" telegram and interprets this as a request to list the directory. After that the WebSocket communication is not working any more. So, the problem might be somewhere else in the protocol stack?

chegewara commented 6 years ago

@squonk11 Can you provide a larger amount of logs? It would be nice to see whole communication between esp32 and browser, or maybe wireshark data file? Because i would like to help, really, but i feel like im listening someones talking on phone. I hear just one part of conversation (i see code) but i dont hear counterpart (what browser client is talking about).

squonk11 commented 6 years ago

Unfortunately I am not able to upload files directly here so I pasted the ESP logdatat at pastebin: https://pastebin.com/Gh9m4L67. But I don't know how to paste files in order to show you the wireshark log. Can I send this by eMail?

chegewara commented 6 years ago

Pastebin is excelent choice. Lets start with this error message. Its right after sending last packet and before this empty request: E (31563) Socket: receive: No more processes This leads us to this line: https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/Socket.cpp#L366 or here: https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/Socket.cpp#L385

its my mail: github@esp32.eu.org, here you can send wireshark data, thanks for sharing data

squonk11 commented 6 years ago

I now sent the wireshark trace and the corresponding ESP32 log. If you need more info or tests, please let me know.

squonk11 commented 6 years ago

I am now trying to send you two more files but I get the error message

5.7.1 host ns1.diskstation.eu [78.10.98.102] said: github@esp32.eu.org: Relay access denied (H-BADRCPT).

from your eMail server...?

chegewara commented 6 years ago

@squonk11 You are smart guy, can you inject logging in this loop to see amountToReadvalue:

https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/Socket.cpp#L377-L394

chegewara commented 6 years ago

I am now trying to send you two more files but I get the error message 5.7.1 host ns1.diskstation.eu [78.10.98.102] said: github@esp32.eu.org: Relay access denied (H-BADRCPT). from your eMail server...?

should have work, tested with gmail and i can send and receive message

squonk11 commented 6 years ago

I now added logging in the loop as requested. The output of the log is:

I (136115) sendAllParainfo: read: F97000000000000000300000001E0400000000010.06;;; I (136120) sendAllParainfo: read: F98000000000000000300000001E0400000000010.07;;; I (136125) sendAllParainfo: read: F99000000000000000100000000E4C00000063D10.08;;; D (136130) WebSocket: >> send: Length: 363 D (136130) Socket: send: Raw binary of length: 367 D (136131) WebSocket: << send I (136131) WebSocketTask: ------ msg deleted I (136131) WebSocketTask: ------ item returned from Ringbuffer D (1137592) HttpParser: >> parseRequestLine: "" [0] D (137592) HttpParser: << parseRequestLine: method: , url: , version: E (142592) Socket: receive: No more processes D (142592) HttpParser: << parse D (142592) HttpRequest: Method: , URL: "", Version: D (142592) HttpRequest: Body: "" D (142592) HttpServerTask: >> processRequest: Method: , Path: D (142593) HttpServerTask: No Path handler found D (142595) HttpServer: Path /sdcard/http is a directory D (142595) HttpResponse: >> sendData D (142595) Socket: send: Binary of length: 36 D (142595) Socket: send: Raw binary of length: 36 D (142597) Socket: send: Binary of length: 12 D (142597) Socket: send: Raw binary of length: 12 D (142598) HttpResponse: << sendData D (142598) HttpResponse: >> sendData D (142599) Socket: send: Binary of length: 177 D (142599) Socket: send: Raw binary of length: 17 D (142600) HttpResponse: << sendData D (142600) HttpResponse: >> sendData D (142600) Socket: send: Binary of length: 13 D (142601) Socket: send: Raw binary of length: 13 E (142602) Socket: send: socket=4101, Connection reset by peer D (142602) HttpResponse: << sendData D (142603) HttpResponse: >> sendData D (142603) Socket: send: Binary of length: 21 D (142604) Socket: send: Raw binary of length: 21

I don't see any output of this loop... , strange

Edit: I put the ESP_LOGD on line 378 (directly under the while statement)

squonk11 commented 6 years ago

I now put an ESP_LOGD in line 367 of Socket.cpp. This one gets logged. So, the result of rc = ::lwip_recv_r(m_sock, data, length, 0); is -1.

chegewara commented 6 years ago

It is -1 as expected. Now we have to trace back and see why. We need to know all parameters value that are in this call:

squonk11 commented 6 years ago

In the corresponding wireshark trace I can not see any telegram coming from the browser. So I suppose the invoaction of Socket::receive() is done somewhere internally?

chegewara commented 6 years ago

This is what ive found. Only option where receive is called with exact=flase is here: https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/WebSocket.cpp#L413-L421

because exact is set to be false if is ommited here: https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/Socket.h#L80

squonk11 commented 6 years ago

Now I logged the data you wanted. Result:

I (50083) WebSocketTask: ------ msg deleted I (50083) WebSocketTask: ------ item returned from Ringbuffer D (54265) Socket: length: 1; data: ╝ E (54265) Socket: receive: No more processes D (54265) HttpParser: << parse

So, data length 1.

chegewara commented 6 years ago

Again, im in rush so im making mistakes. Here is one more place where its called with exact=false: https://github.com/nkolban/esp32-snippets/blob/59f667e5614c611ad68c8d1766b731e911de43bf/cpp_utils/Socket.cpp#L319

Next step up is here(which is more consistent with logs): https://github.com/nkolban/esp32-snippets/blob/2adbc1cb90e3dd567a3a0c80f7fed617400c065c/cpp_utils/HttpParser.cpp#L191-L196

chegewara commented 6 years ago

@squonk11 How looks like yours WebSocketTask?

chegewara commented 6 years ago

@squonk11 Have you seend this? Its another empty request:

D (26562) HttpParser: >> parseRequestLine: "" [0]
D (26563) HttpParser: << parseRequestLine: method: , url: , version:
I (26565) sendAllParainfo: read: 234000000000000000100000000E4C00000001810.05;;;
I (26570) sendAllParainfo: read: 2350000000000000000000000000600080000000300.20.32
;;V;
squonk11 commented 6 years ago

this is my WebSocket task:

class WebSocketTask : public Task {
public:
   WebSocketTask(std::string taskName="Task", uint16_t stackSize=10000, uint8_t priority=5, BaseType_t core_id=tskNO_AFFINITY): Task(taskName, stackSize, priority, core_id) {};
   void run(void *data) {
       size_t size;
       WebSocketMessage* pmsg;
//     ESP_LOGI("WebSocketTask","------ init");
       while(1) {
           pmsg = (WebSocketMessage*) xRingbufferReceiveFromISR(WebSocket_rx_queue, &size);
           //ESP_LOGI("WebSocketTask","------ Ringbuffer read");
           //ESP_LOGI("WebSocketTask","size:%d; ", size );
           if(pmsg != NULL) {
               ESP_LOGI("WebSocketTask","data:%s", (*pmsg).getData()->c_str());
               ESP_LOGI("WebSocketTask","WSPointer:%d", (int)(*pmsg).getWebSocket());
               ESP_LOGI("WebSocketTask","------");
//temp             (*pmsg).getWebSocket()->send(*((*pmsg).getData())+"_WS", WebSocket::SEND_TYPE_TEXT);
               //vTaskDelay(500);
//             WebSocketMessage* pmsg_save = pmsg;
//             delete(pmsg);
//             ESP_LOGI("WebSocketTask","------ WSmsg deleted");
               if(*((*pmsg).getData()) == "getallparainfo")
                   sendAllParainfo(*(*pmsg).getWebSocket());
               delete((*pmsg).getData());
               ESP_LOGI("WebSocketTask","------ msg deleted");
               vRingbufferReturnItemFromISR(WebSocket_rx_queue, pmsg, 0);
               ESP_LOGI("WebSocketTask","------ item returned from Ringbuffer");
           }
           vTaskDelay(1);
       }
   }
};

yes, I have seen that there is an empty request line. But I don't know where it comes from.

squonk11 commented 6 years ago

The sendAllParainfo() function is this:

void sendAllParainfo(WebSocket ws) { char data[1000]; char response[100]; int res; int num = 0; int totalLength = 0; uint16_t length = 0; bool lastPI = false;

VABus fu;
fu.SetUART(&g_uart2);
res = fu.Write('1', "990", '0', (char*)"0000");
do {
    for(int i = 0; i < 10; i++) {
        data[totalLength++] = 'P';
        data[totalLength++] = 'I';
        data[totalLength++] = ':';
        res = fu.Read('1', "991", '0', data + totalLength, &length );
        if(res == 0) {  // read o.k.
            if(strncmp(data + totalLength, "999", 3) == 0) {
                lastPI = true;
                break;
            }
            ESP_LOGI("sendAllParainfo","read: %s", data + totalLength);
            totalLength += length;
        }
    }
    ws.send(data, WebSocket::SEND_TYPE_TEXT);
    totalLength = 0;
    //vTaskDelay(1);
} while(!lastPI);

};