nkolban / esp32-snippets

Sample ESP32 snippets and code fragments
https://leanpub.com/kolban-ESP32
Apache License 2.0
2.37k stars 710 forks source link

WebSocket.send() and Socket.send() have no return value #344

Closed squonk11 closed 6 years ago

squonk11 commented 6 years ago

Currently I am working on my application which uses WebSocket communication. Here I am sending several hundred WebSocket telegrams to the browser in very short time. This works well when logging is enabled but the Socket connection gets reset when I switch logging off. I assume that I am pushing data too fast into the Socket so that it crashes. In order to overcome this situation I would like to check if the Socket is free for new data or to check if the previous send was successfull. For this I could check the return value of WebSocket.send() or Socket.send(). But these functions are currently defined as void.

squonk11 commented 6 years ago

I am not calling Socket::receive(). I have no idea where this problem comes from.

squonk11 commented 6 years ago

Now I sent you the wireshark trace via gmx. I hope it works this time.

chegewara commented 6 years ago

@squonk11 I have question about code. With data from logging this is never true:

if(strncmp(data + totalLength, "999", 3) == 0) {
                lastPI = true;
                break;
            }

Is that right?

squonk11 commented 6 years ago

yes, it becomes true. Otherwise the do-while loop would never end. But it ends correctly. I now added a line ESP_LOGI("sendAllParainfo","ended!"); behind while. Now the log output is:

I (26642) sendAllParainfo: read: F99000000000000000100000000E4C00000063D10.08;;;

D (26648) WebSockett: >> send: Length: 363 D (26648) Socket: send: Raw binary of length: 367 D (26649) WebSocket: << send I (26649) sendAllParainfo: ended! I (26649) WebSocketTask: ------ msg deleted I (26650) WebSocketTask: ------ item returned from Ringbuffer D (27875) Socket: length: 1; data: ╝ E (27875) Socket: receive: No more processes D (27875) HttpParser: << parse D (27875) HHttpRequest: Method: , URL: "", Version: D (27875) HttpRequest: Body: "" D (27876) HttpServerTask: >> processRequest: Method: , Path: D (27876) HttpServerTask: No Path handler found D (27878) HttpServer: Path /sdcard/http is a directory D (27878) HttpResponse: >> sendData D (27879) Socket: send: Binary of length: 36 D (27879) Socket: send: Raw binary of length: 36

at time 26649 you see the output.

chegewara commented 6 years ago

You right, just log doesnt show it because this line is after break: ESP_LOGI("sendAllParainfo","read: %s", data + totalLength);

Im reading wireshark data now.

squonk11 commented 6 years ago

I must confess: this do-while loop is not so easy to understand but it is the style of programming I did during the last 30 years on quite simple microcontrollers (30MHz). Here highly optimized code (even assembly code) was very important. Unfortunately, high level C++ is much more resource hungry. That is why I am not so happy that the HttpServer is written in C++; it makes it more easy to read and understand but unfortunately the performance suffers significantly. I checked the source code of Mongoose.c: it is highly optimized C code and they are not using the standard API of lwip but some low level routines thereof. I guess that is why Mongoose performes roughly about a factor of 10 better.

chegewara commented 6 years ago

From wireshark i see you sent 2 more requests "getallinfo", right (frames: 443 and 447)? But, if i understand code right in this line: delete((*pmsg).getData()); you deleted all data so there is nothing to send, am i right?

chegewara commented 6 years ago

@squonk11 Im on irc if you have time and are willing to join

squonk11 commented 6 years ago

yes, I clicked these two more times on the upload button in order to read the data once more. This request is transmitted from the browser to the ESP correctly and is ACKed by lwip in the following frames (444 and 448). But these request telegrams do not initiate any activity on the ESP - there is absolutely no ESP-log output when I click the upload button after the data has been uploaded before.
The delete((*pmsg).getData()); just deletes the previous upload request. By klicking the upload button again there should be another 'getallparainfo' queued in the ringbuffer. But this does not happen - there is even no reaction in the lower level HttpServer routines.

squonk11 commented 6 years ago

I am trying to debug the situation a little bit. But I do not know the deeper logic behind the HttpServer and WebSocket code. Can you explain in a few sentences?

chegewara commented 6 years ago

In this matter mr @nkolban can give you best response. I can only show when and how is detected if connection is websocket: https://github.com/nkolban/esp32-snippets/blob/6057871db7af7270d2dffa4c95764c23349f1fea/cpp_utils/HttpRequest.cpp#L113

squonk11 commented 6 years ago

This I found already and I also started to study the HttpServer sources. But until now without success. I found one more strange thing in my logs:

D (9531) Socket: close: m_sock=4099, ssl: 0 D (9531) Socket: Calling lwip_close on 4099 D (9532) Socket: close: m_sock=-1, ssl: 0 D (9533) HttpServerTask: Waiting for new peer client

Here you can see that for one HttpRequest the stocket seems to be closed twice. At 9531 it is closed regularly and at 9532 it seems as if there is the attempt to close it once again although it is already closed (m_sock = -1). Strange...

nkolban commented 6 years ago

There are some design notes from the WebSocket component from last year found here:

https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/DesignNotes/WebSockets.md

However, lets see if I can't start addressing your specific questions. This will be more an iterative interaction as opposed to one and done. We may want to chat live via IRC if needed or else just post back here.

Imagine we have an HTTP Server ... it creates an inbound listening port and passively waits for an incoming connection request. For example, lets assume it is listening on local port 80. A remote browser will then connect to the ESP32 by its IP address at port 80. This is when the fun starts.

In TCP/IP, when we listen on a socket port we use the sockets API called accept().

For example:

int newSocket = accept(serverSocket);

This is a blocking call that will listen on port 80 and, when a connection arrives, will give us a NEW socket which is used just for that interaction. The ESP32 will now start to read the data that is incoming on the request. It expects it to be HTTP protocol. It starts to parse and interpret the data. Every HTTP request starts with an HTTP header. For a normal browser request, this will be instructions on what data the request wants and other control information. Now look at RFC 6455. This is the low-level protocol specification of "WebSockets". I don't want you (unless you really want to) have to study it. That's what the library is for ... it implements this protocol.

At the highest level, when the HTTP request arrives at the ESP32 from the browser, it is either an "ordinary" HTTP request OR it contains the header "Upgrade: websocket". This is the indication that the browser wants to start talking websockets. Now the magic happens. The connection that was already received (the original HTTP connection) is "re-purposed" to now flow web socket protocol. No new connection is formed. The connection between the ESP32 and the browser happens over that already existing connection. The connection is maintained for as long as either side wishes to send or receive data.

If you wanted to see the magic in code, this is a good place to look:

https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/HttpServer.cpp#L84

The function (processRequest) is invoked for each new incomming HTTP request received by the ESP32. We look for a handler and then get ready to invoke it. Before we invoke the handler we ask:

"Is THIS current request for a WebSocket?"

If it is, then we spawn a new task whose job is to do nothing but watch this new socket for new incoming data:

https://github.com/nkolban/esp32-snippets/blob/master/cpp_utils/WebSocket.cpp#L364

squonk11 commented 6 years ago

Thank you for the comprehensive information. Based on this I started some analysis and it seems as if the WebSocketTask gets started twice. The second request from the browser goes to the port of the first connection and the int length = peerSocket.receive((uint8_t*)&frame, sizeof(frame), true); waits endless for this request. Currently I am not sure if this problem is in my code or in the HttpServer classes. I will investigate and let you know. Here is the log:

D (22425) WebSocketReader: WebSocketReader Task started, socket: fd: 4101
D (22426) WebSocketReader: Waiting on socket data for socket fd: 4101
D (31716) WebSocketReader: Received datta from web socket.  Length: 2
D (31716) WebSocketReader: Web socket payload, length=14:
Rb size 1024 free 1007 rptr 0 freeptr 0 wptr 16
I (31717) WebSocketTask: data:getallparainfo
I (31717) WebSocketTask: WSPointer:1073683332
I (31717) WebSocketTask: ------
D (31718) WebSocketReader: Waiting on socket data for socket fd: 4101
D (31778) WebSocketReader: Received data from web socket.  Length: 2
D (31832) WebSocketReader: Web socket payload, length=14:
Rb size 1024 free 991 rptr 16 freeptr 0 wptr 32
D (31833) WebSocketReader: Waiting on socket data for socket fd: 4101
I (35523) WebSocketTask: ------ msg deleted
I (35523) WebSocketTask: ------ item returned from Ringbuffer
I (35524) WebSocketTask: data:
I (35524) WebSocketTask: WSPointer:1073683332
I (35524) WebSocketTask: ------
I (35524) WebSocketTask: ------ msg deleted
I (355525) WebSocketTask: ------ item returned from Ringbuffer
D (114741) paraWriteHandler: Request-Path: /api/v1/pw
D (114742) paraWriteHandler: count=4;P=990; DS=0; BA=1; VAL=0000
D (1114745) paraWriteHandler: Write : 0:
D (122806) WebSocketReader: WebSocketReader Task started, socket: fd: 4099
D (122807) WebSocketReader: Waiting on socket data for socket fd: 4099
D (129898) WebSocketReader: Received data from web socket.  Length: 2
D (129899) WebSocketReader: Web socket payload, length=14:
Rb size 1024 free 1007 rptr 32 freeptr 32 wptr 48
I (129899) WebSocketTask: data:getallpparainfo
I (129899) WebSocketTask: WSPointer:1073621352
I (129900) WebSocketTask: ------
D (129901) WebSocketReader: Waiting on socket data for socket fd: 4099

At timestamp 22425 the WebSocketReader Task gets started the first time and has the fd:4101. At time instant 122806 the WebSocketReader Task gets started the second time; this time with fd:4099. I think this is not a good constellation...

squonk11 commented 6 years ago

Sorry the previous post was too fast. This only happened because I opened my website twice... :-( But nevertheless the log now looks like this:

D (24349) WebSocketReader: WebSocketReader Task started, socket: fd: 4101
D (24349) WebSocketReader: Waiting on socket data for socket fd: 4101
D (28421) WebSoocketReader: Received data from web socket.  Length: 2
D (28422) WebSocketReader: Web socket payload, length=14:
Rb size 1024 free 1007 rptr 0 freeptr 0 wptr 16
I (28423) WebSocketTask: data:getallparainfo
I (28423) WebSocketTask: WSPointer:1073683660
I (28423) WebSocketTask: ------
D (28424) WebSocketReader: Waiting on socket data for socket fd: 4101
I (32235) WebSocketTask: ------ msg deleted
I (32235) WebSocketTask: ------ item returned from Ringbuffer

Here you can see that the WebSocketReader waits two times for data: once at timestamp 24349 and the second time at timestamp 28424. In Wireshark I can see that the browser sends correct Websocket telegrams to the ESP, lwip correctly ACKs them but they are not listed in the log. So, unfortunately the problem remains the same: after the first correctly received websocket telegram all following websocket telegrams get lost between lwip and HttpServer.

nkolban commented 6 years ago

I think I'd like to see all the code in context. I'm imaging you have created an HttpServer ... I'm then imagine you have created a handler for a path (what is the path you are using?).

I'm imagining it would be something like:

static void wsOpenHandler(HttpRequest* pRequest, HttpResponse* pResponse) {
    if (!pRequest->isWebsocket()) {
        return
    }

    WebSocket *pWebSocket = pRequest->getWebSocket();
    pWebSocket->setHandler(new MyWebSocketHandler());
}

So what you are doing is receiving an HTTP request at a path ... asking the question ... are you a WebSocket ...? and if yes, attaching a handler to that websocket.

The handler will have methods such as:

onMessage(WebSocketInputStreambuf* pWebSocketInputStreambuf)

Now what I'd like to see is the implementation of the onMessage function that is invoked when a message is received.

squonk11 commented 6 years ago

This is my onMessage Handler:

void onMessage(WebSocketInputStreambuf* pWebSocketInputStreambuf, WebSocket *pWebSocket) {
        static char tag[] = "MyWebsocketHandler";
        std::string *pmsg = new std::string("");
        std::ostringstream ss;
        ss << pWebSocketInputStreambuf;
        *pmsg = ss.str();
        ESP_LOGD(tag, "MyWebSocketHandler: Data length: %d; %s", (*pmsg).length(), (*pmsg).c_str());
        WebSocketMessage WsMsg;
        WsMsg.setData(pmsg);
        WsMsg.setWebSocket(pWebSocket);
        ESP_LOGI(tag, "Pointer to Websocket: %d; size: %d",(int)WsMsg.getWebSocket(), sizeof(WsMsg));
        if(pdFALSE == xRingbufferSendFromISR(WebSocket_rx_queue, (void *)(&WsMsg), sizeof(WsMsg), 0))
            ESP_LOGE(tag, "Received data could not be queued");
        xRingbufferPrintInfo(WebSocket_rx_queue);
        pWebSocket->send(*pmsg, WebSocket::SEND_TYPE_TEXT);
    } // onMessage

this is the installation of the PathHandler:

pHttpServer->addPathHandler(HttpRequest::HTTP_METHOD_GET, "/api/v1/ws", wsHandler);

this is my wsHandler function:

static void wsHandler(HttpRequest* pRequest, HttpResponse* pResponse) {
    static char tag[] = "wsHandler";
    ESP_LOGD(tag, "Request-Path: %s", pRequest->getPath().c_str());
    WebSocket *pWebSocket = pRequest->getWebSocket();
    ESP_LOGD(tag, "m_pWebSocket: %x", (unsigned int)pWebSocket);
    WebSocketHandler *pMyWebSocketHandler = new MyWebSocketHandler();
    pWebSocket->setHandler(pMyWebSocketHandler);
    pWebSocket->send("Hello", WebSocket::SEND_TYPE_TEXT);
}
nkolban commented 6 years ago

What we have is indeed a good puzzle. I am not yet seeing the underlying nature of the problem but I have some ideas to make progress and while we still have ideas, we aren't stuck. Let me see if I can summarize the problem and you tell me if you agree with the story or see any flaws in my mental model of what is going on.

You are writing an ESP32 WebSocket based application. To that end you are using the HttpServer classes supplied in this repository. The HttpServer classes allow an ESP32 C++ application to start listening on an port on the ESP32 for incoming HTTP requests from a browser. This is a necessary start for being a WebSocket server.

In your logic for the HTTP Server, you are adding a path handler for "/api/v1/ws". What this means is that whenever there is a new HTTP request to the IP address and port of your ESP32 at path "/api/v1/ws", that path handler logic that you provide will be invoked.

Your browser now makes a WebSocket initiation request to "/api/v1/ws" which causes the path handler to be invoked.

In the path handler, you create an instance of the WebSocketHandler you care about and attach that to the newly received WebSocket instance and that is the end of the generic HTTP Path Handler.

At this point, the ESP32 has a "live" web socket connection between it and the browser. By the time the path handler has been called we now know that we have a good Web Socket. Once you have attached your Web Socket handler, it will respond to messages sent by the browser.

In your onMessage handler, you appear to do a few things.

  1. You read the whole of the incoming message into a data buffer
  2. You place a copy of the WebSocket that was just used onto a RingBuffer.
  3. You send a copy of the message back through the web socket as a new outbound message

This is the end of processing the original received message (from the code you have shown).

Your concern is that when your browser sends a message this seems to work the first time but when the browser sends message number 2, it doesn't appear to show up at the ESP32 as evidenced by the lack of a subsequent log message.

Is this a summary of the story?

Here is what I suggest to gain more data for the puzzle.

First, let us reduce the complexity. While I fully realize that your solution as a whole wants to do something with the received (inbound to ESP32) message from the browser, let us temporarily comment that out. Let us test that we can receive a sequence of serial messages from the browser. I think (to me) this means commenting out the ring buffer logic so that the story is JUST that we receive a message at the ESP32, we issue a log message and then end the onMessageHandler. Change the logic of your client so that it merely periodically sends messages one after the other (with variable time between the messages). I would call this the base case. If this doesn't work then we have dramatically narrowed the scope of our puzzle.

Next I would say let us increase the time between the browser sending new messages to the ESP32. For example, does 1 message a second work but 50 messages a second fail?

Again, let me stress that these are diagnostic techniques only ... keep a safe copy of your original code so that when and if needed we can revert back and not lose anything.

As always, we are more than prepared to work with you to resolve the issue and we humbly thank you for working with our libraries. These are open source / personal hobby projects for us and we only get to work on them on our free time. This means that it may take a while to get to conclusion of the puzzle. We would love to dedicate our time to these projects ... but unfortunately, food doesn't grow on trees (well ... wait ... that's a bad analogy because I guess it does ... ).

squonk11 commented 6 years ago

thank you for your long and comprehensive answer. Using your advice I got some interesting results:

  1. Using the OnMessage handler only: it works
  2. Using additionally the Ringubffer logic:: it works
  3. Using a lengthy WebSocket.send message in my Task handling the Ringbuffer contents: it works
  4. Using a lenghty Websocket.send (similar length as in 3.) which is the output of my device (read via serial interface): it does not work!

So, the problem arises as soon as I send the data which I retrieve from my device via serial interface. Now the good question is: why? I see the following possiblities: a) there is some interference of the UART code with the WebSocket code b) the UART code task is running on another core than the WebSocket code -> issue here? c) The data I receive from my device is not always TEXT (as it should be)

Do you see an additional possible reason? Which is the most likely problem from your point of view? a, b or c?

chegewara commented 6 years ago

Hi, sorry for interfering. Point a) you test by sending random generated or prepared data instead from UART. b) you can test by running whole code just on 1 core (menuconfig settings or xCreateTaskPinnedToCore). c) can be tested similar to test a) (random or prepared data)

squonk11 commented 6 years ago

Test b) I did already. I now have all tasks on the same core. Now I am able to start the data transmission from the uart via my handler over the websocket to the browser a second time. But: during this second transmission I get an Guru Meditation Error on core 0. Next I will try to read data via the Uart but I will send other data via WebSocket.

squonk11 commented 6 years ago

case c) I can now exclude because if I replace the data to be send by regular characters (e.g.: all 'a') it also does not work. So I think the problem is somewhere in the UART code. There is some code written by me; probably the error is there. I will check this now.

squonk11 commented 6 years ago

Until now I could not find an error in my UART code. It is code which I am using already since several years on different microcontrollers with minor modifications only. But I now saw that I might have a totally different issue: During the startup phase the log shows a strange error message (in yellow) which I did not notice until now:

0x40080000: _WindowOverflow4 at E:/msys32/opt/esp/esp-idf/components/freertos/xtensa_vectors.S:1685

and

0x400d0018: _flash_cache_start at ??:?

and

0x4008bfa4: bb_init at ??:?

what does that mean?

nkolban commented 6 years ago

For the last post ... lets create a new issue on this ... it doesn't feel at all related to the original issue.

squonk11 commented 6 years ago

o.k. I will do.

squonk11 commented 6 years ago

Problem solved!!! It was not related to my UART code but to another function I coded. I am not 100% sure what exactly was the problem but I assume the following: I passed the WebSocket to another function (not a pointer to the WebSocket). Within this function I used the WebSocket for sending data using WebSocket::send(...); what perfectly worked. Now I assume that when this function ends the destructor of the WebSocket class is invoked and thus the WebSocket closed. Now I changed my code and I am passing a pointer to the WebSocket to my function and all works well.

Shame on me for not finding this problem on my own and that I bothered you for this. I think now I gambled away the beer I deserved last time...

nkolban commented 6 years ago

I'm just happy the problem is resolved. When a puzzle is complete, I like to look back at it and ask to see if there was a methodology that could have been applied to get to the solution quicker. I think this one sounds like a "minimization" story. If we had reduced function over and over again until it worked we would have found (at the point we eliminated the UART code) that it magically started working. This "divide and conquer" approach is certainly one way to resolve an issue.

When things get tricky, I commonly ask for "the minimum re-createable problem" ...which is the smallest code set a user can create that demonstrates the problem. Some users (NOT YOU) push back with comments like "Why are YOU making me work ... just solve MY problem?". 50% of the time when a user tries to recreate in the small, they can't ... what makes me sad is the 50% of the time when they can and then Ive made them work to help solve the problems I created in the first place.

Anyway ... I'm just delighted all is working.