me-no-dev / ESPAsyncTCP

Async TCP Library for ESP8266
GNU Lesser General Public License v3.0
754 stars 392 forks source link

File loads only partially #20

Closed marwijn closed 7 years ago

marwijn commented 8 years ago

I'm using the async webbrowser to load a file in the webbrowser. This works well when using chrome version 52 on a desktop PC. However using chrome (v52) on an android phone will only partly load the file and show a timeout in the logging. The problem seems to occur with any file over about 2k, but only on my telephone.

The logging that I've captured can be found below:

del if1 usl mode : sta(5c:cf:7f:17:5d:98) add if0 sta config unchangedf r0, scandone no *** found, reconnect after 1s wifi evt: 1 STA disconnect: 201 reconnect f 0, scandone no *** found, reconnect after 1s wifi evt: 1 STA disconnect: 201 reconnect f -180, scandone no *** found, reconnect after 1s wifi evt: 1 STA disconnect: 201 STA: Failed! reconnect f r0, scandone del if0 usl mode : softAP(5e:cf:7f:17:5d:98) add if1 dhcp server start:(ip:192.168.4.1,mask:255.255.255.0,gw:192.168.4.1) bcn 100 [APConfig] local_ip: 192.168.4.1 gateway: 192.168.4.1 subnet: 255.255.255.0 [APConfig] DHCP IP start: 192.168.4.100 [APConfig] DHCP IP end: 192.168.4.200 [AP] softap config unchanged OTA server at: esp8266-175d98.local:8266 SPIFFSImpl: allocating 512+180+1400=2092 bytes SPIFFSImpl: mounting fs @100000, size=2fb000, block=2000, page=100 SPIFFSImpl: mount rc=0

wifi evt: 7 wifi evt: 7 wifi evt: 7 wifi evt: 7 wifi evt: 7 wifi evt: 7 add 1 aid 1 station: 24:df:6a:53:34:ba join, AID = 1 wifi evt: 5 wifi evt: 7 wifi evt: 7 wifi evt: 7 wifi evt: 7 wifi evt: 7 TIMEOUT: 5142, state: Established SPIFFS_close: fd=1 SPIFFS_close: fd=1 wifi evt: 7 wifi evt: 7 wifi evt: 7

me-no-dev commented 8 years ago

you can do two things: change this define to larger number or call this method on the client to change it.

marwijn commented 8 years ago

The only thing that changes is the TIMEOUT above in the log. Changed from 5142 to 60241. Below the log when the same site is loaded from chrome/PC. The log file above seems to be missing urd, urch, urn log lines what are these ?

del if1 usl mode : sta(5c:cf:7f:17:5d:98) add if0 sta config unchangedf r0, scandone no *** found, reconnect after 1s wifi evt: 1 STA disconnect: 201 reconnect f 0, scandone no *** found, reconnect after 1s wifi evt: 1 STA disconnect: 201 reconnect f -180, scandone no *** found, reconnect after 1s wifi evt: 1 STA disconnect: 201 STA: Failed! del if0 usl mode : softAP(5e:cf:7f:17:5d:98) add if1 dhcp server start:(ip:192.168.4.1,mask:255.255.255.0,gw:192.168.4.1) bcn 100 [APConfig] local_ip: 192.168.4.1 gateway: 192.168.4.1 subnet: 255.255.255.0 [APConfig] DHCP IP start: 192.168.4.100 [APConfig] DHCP IP end: 192.168.4.200 [AP] softap config unchanged OTA server at: esp8266-175d98.local:8266 SPIFFSImpl: allocating 512+180+1400=2092 bytes SPIFFSImpl: mounting fs @100000, size=2fb000, block=2000, page=100 SPIFFSImpl: mount rc=0 wifi evt: 7 wifi evt: 7 wifi evt: 7 wifi evt: 7 add 1 aid 1 station: 80:00:0b:bf:42:80 join, AID = 1 wifi evt: 5 :urn 40 :urd 11, 40, 13 :urd 4, 40, 25 :urd 5, 40, 30 :urch 40, 40 :urd 11, 40, 13 :urd 4, 40, 25 :urd 5, 40, 30 :urch 40, 40 :urd 11, 40, 13 :urd 4, 40, 25 :urd 5, 40, 30 :urch 40, 40 :urd 11, 40, 13 :urd 4, 40, 25 :urd 5, 40, 30 :urch 40, 78 :urd 15, 78, 13 :urch 78, 49 :urch 49, 78 :urd 15, 78, 13 :urch 78, 49 :urch 49, 78 :urd 15, 78, 13 :urch 78, 49 :urch 49, 145 :urch 145, 145 :urch 145, 40 :urd 11, 40, 13 :urd 4, 40, 25 :urd 5, 40, 30 :urch 40, 40 :urd 11, 40, 13 :urd 4, 40, 25 :urd 5, 40, 30 :urch 40, 145 :urch 145, 40 :urd 11, 40, 13 :urd 4, 40, 25 :urd 5, 40, 30 :urch 40, 40 :urd 11, 40, 13 :urd 4, 40, 25 :urd 5, 40, 30 :urch 40, 40 :urd 11, 40, 13 :urd 4, 40, 25 :urd 5, 40, 30 :urch 40, 40 :urd 11, 40, 13 :urd 4, 40, 25 :urd 5, 40, 30 :urch 40, 145 :urch 145, 40 :urd 11, 40, 13 :urd 4, 40, 25 :urd 5, 40, 30 :urch 40, 40 :urd 11, 40, 13 :urd 4, 40, 25 :urd 5, 40, 30 SPIFFS_close: fd=1 SPIFFS_close: fd=1 SPIFFS_close: fd=2 SPIFFS_close: fd=2 SPIFFS_close: fd=1 SPIFFS_close: fd=1 SPIFFSImpl::open: invalid path=/fonts/fontawesome-webfont.woff2 SPIFFSImpl::open: invalid path=/fonts/fontawesome-webfont.woff2.gz SPIFFSImpl::open: invalid path=/fonts/fontawesome-webfont.woff2/index.htm SPIFFSImpl::open: invalid path=/fonts/fontawesome-webfont.woff2/index.htm.gz NOT_FOUND: GET http://192.168.4.1/fonts/fontawesome-webfont.woff2 _HEADER[Connection]: keep-alive _HEADER[Origin]: http://192.168.4.1 _HEADER[User-Agent]: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 HEADER[Accept]: /_ _HEADER[Referer]: http://192.168.4.1/font-awesome.min.css _HEADER[Accept-Encoding]: gzip, deflate, sdch _HEADER[Accept-Language]: nl-NL,nl;q=0.8,en-US;q=0.6,en;q=0.4 _GET[v]: 4.6.1 SPIFFS_close: fd=4 SPIFFS_close: fd=4 SPIFFS_close: fd=3 SPIFFS_close: fd=3 :urch 40, 145 SPIFFS_close: fd=1 SPIFFS_close: fd=1 wifi evt: 7 SPIFFS_close: fd=2 SPIFFS_close: fd=2 SPIFFS_close: fd=1 SPIFFS_close: fd=1 SPIFFSImpl::open: fd=-1 path=/debug openMode=0 accessMode=1 err=-10002 SPIFFSImpl::open: fd=-1 path=/debug.gz openMode=0 accessMode=1 err=-10002 SPIFFSImpl::open: fd=-1 path=/debug/index.htm openMode=0 accessMode=1 err=-10002 SPIFFSImpl::open: fd=-1 path=/debug/index.htm.gz openMode=0 accessMode=1 err=-10002 NOT_FOUND: GET http://192.168.4.1/debug _HEADER[Connection]: Upgrade _HEADER[Pragma]: no-cache _HEADER[Cache-Control]: no-cache _HEADER[Upgrade]: websocket _HEADER[Origin]: http://192.168.4.1 _HEADER[Sec-WebSocket-Version]: 13 _HEADER[User-Agent]: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36 _HEADER[Accept-Encoding]: gzip, deflate, sdch _HEADER[Accept-Language]: nl-NL,nl;q=0.8,en-US;q=0.6,en;q=0.4 _HEADER[Sec-WebSocket-Key]: FiQPDjTfWtbZMdjlmoUlcA== _HEADER[Sec-WebSocket-Extensions]: permessage-deflate; client_max_window_bits ws[/ws][1] connect ws[/ws][1] pong[0]: chg_A4:0 wifi evt: 7 wifi evt: 7 wifi evt: 7 wifi evt: 7 wifi evt: 7 wifi evt: 7 :urch 145, 145

me-no-dev commented 8 years ago

that TIMEOUT means that you did not get an ACK for the last packet that was sent.

marwijn commented 8 years ago

Yes that is as far as I understand. As the phone seems to work with any other website, I wonder what is different with the ESP ASyncTcp/AsyncWebserver. The files do get served with the non async FsBrowser example. I'm not very familiar with TCP protocol, is an ack necessary for each package ? If no ack is giving shouldn't the esp retry sending the package(s) ?

me-no-dev commented 8 years ago

an ACK means that the other end got the packet. I will look through the code and see if I can do anything about it, but in general the server is waiting for that ack to know how much data was sent and what is the progress of the connection.

marwijn commented 8 years ago

I suspect that the phone is unable to keep up with the data sent from the server. Then drops the package. Therefore does not send an ack. And waits for retransmission from the ESP. (by TCP spec)

me-no-dev commented 8 years ago

I guess the ESP also gives up, but that does not get propagated up to my library. By the way I just checked the code and the server sends as much as it can, as long as lwip reports that there is space for data. I guess lwip is holding to the data also and does not let me send anything

me-no-dev commented 8 years ago

interesting why you say it works with the other server... we use the same stack.

marwijn commented 8 years ago

I did add a bit of extra logging in Webresponses.cpp around line 250 and further:

size_t AsyncAbstractResponse::_ack(AsyncWebServerRequest *request, size_t len, uint32_t time){ os_printf("AsyncAbstractResponse::_ack %d %d\n", len, time); if(!_sourceValid()){

and

if(outLen) os_printf("AsyncAbstractResponse::_write2 : "); _writtenLength += request->client()->write((const char*)buf, outLen); os_printf("%d\n", _writtenLength);

When using my PC everything works correct and I get the following logging :

AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_write2 : 2920 AsyncAbstractResponse::_ack 2920 11 AsyncAbstractResponse::_write2 : 5840 AsyncAbstractResponse::_ack 2920 13 AsyncAbstractResponse::_write2 : 8760 AsyncAbstractResponse::_ack 2920 10 AsyncAbstractResponse::_write2 : 11680 AsyncAbstractResponse::_ack 2920 8 AsyncAbstractResponse::_write2 : 14600
AsyncAbstractResponse::_ack 2920 8
AsyncAbstractResponse::_write2 : 17323 AsyncAbstractResponse::_ack 1460 55 AsyncAbstractResponse::_ack 1263 60

However when using my phone the following logging shows:

AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_write2 : 2920 AsyncAbstractResponse::_ack 1360 9 AsyncAbstractResponse::_write2 : 4280 AsyncAbstractResponse::_ack 1360 1 AsyncAbstractResponse::_write2 : 5640

Instead of an ack of the 2920 sent bytes an ack is sent for only 1360 bytes. Can it be that the package size on my phone is smaller and at most 1360 bytes can be sent per package ?

After this I hardcoded

space = 1360;

And now the file loads.

me-no-dev commented 8 years ago

it's totally possible that your phone's stack has a 1360 bytes limit per packet, but that still should not matter and all data should be transmitted, given that the phone's stack is OK

marwijn commented 8 years ago

In function space in ESPAsyncTcp.cpp. Shouldn't tcp_sndbuf be tcp_mss ? (Max segment size)

me-no-dev commented 8 years ago

it's 1 or 2 times MSS. Whatever the stack can take in memory for that connection. If it returns 0, then the stack is full and waiting for ack or retries to finish so it can take more data. ESP8266 has 1460 MSS

crashsoft commented 8 years ago

I have the same problem as marwijn. Thanks to this post, I finally fixed it by hardcoding this 'space' in Webresponses.cpp. But i had to change space to slightly different number. space = 1400;

I have found, that pages loads normally in my local network, but not from internet :( And the heap size decreases dramatically if i don't have space set to 1400 and try to load larger (>3kB) files outside my home network...

Is this an issue with specific routers? My router is "Thomson TG789vn"

Hope there will be official fix.

me-no-dev commented 8 years ago

can you set that number to 1452 and try again? I might have an idea :)

crashsoft commented 8 years ago

With space set to 1452 i can't load webpages. Inside my home networks works fine.

AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_write2 : 1452 AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_write2 : 1452 AsyncAbstractResponse::_ack 1400 29 AsyncAbstractResponse::_write2 : 2816 TIMEOUT: 5072, state: Established SPIFFS_close: fd=1 SPIFFS_close: fd=1

me-no-dev commented 8 years ago

@crashsoft in your case 1400 is the magic number. But why? What makes the MTU different? (You can see you get ack with size 1400)

crashsoft commented 8 years ago

So we can't fix the problem :/ It occurs only withsome routers.

Here is another example. Now with size_t space = request->client()->space();

This is repsonse from client within LAN. ack is 2920

AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_write2 : 2920 AsyncAbstractResponse::_ack 2920 28 AsyncAbstractResponse::_write2 : 4765 AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_write2 : 876 AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_write2 : 2165 AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_write2 : 1098 AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_ack 0 0

This is repsonse from client outside LAN. ack is 1400

AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_write2 : 2920 AsyncAbstractResponse::_ack 1400 32 AsyncAbstractResponse::_write2 : 2800 AsyncAbstractResponse::_ack 1400 2 AsyncAbstractResponse::_write2 : 3245 AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_write2 : 876 AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_write2 : 2165 AsyncAbstractResponse::_ack 1400 33 AsyncAbstractResponse::_ack 0 0 AsyncAbstractResponse::_write2 : 1098 AsyncAbstractResponse::_ack 1098 47 TIMEOUT: 5103, state: Established

me-no-dev commented 8 years ago

it will be great if we know what is the connection minimum MTU, but we do not. I read some about it and the conclusion is that there are many cases where somewhere across the path of the connection there can be a device that has lower MTU than what you have on your interface. Safe size was 576 bytes or a number close to that. Surely imposing such constrain on the network stack will have really bad effect on the throughput :) In your case the magic number is 1400, but in other cases it's 1360 or something else :) Sometimes the limit is imposed by pppoe or other protocol that usually have a known size, but 1400 or 1360 is nothing even close to the known protocols... If you guys have an idea how the limit can be sampled safely? maybe watch the ack size, but that does not guarantee successful transmission because it requires sending larger buffer. Can you test that theory?

pcarver commented 8 years ago

The "standard" approach is PMTU https://en.wikipedia.org/wiki/Path_MTU_Discovery but that can fail if there are overly enthusiastic security devices in the path and I don't know if doing PMTU on the ESP8266 is asking too much.

me-no-dev commented 8 years ago

I'm not positive that the esp will get the ICMP responses at all or if it even integrated into lwip. Will see what I can find :)

me-no-dev commented 7 years ago

ok i looked through lwip and such messages are not handled at all. I added this to a local branch here which will echo if fragmentation icmp is received. Can you add it on your side and use "Open Source (gcc)" option for "Core Development Module" to build lwip on the fly and see if you get it?

me-no-dev commented 7 years ago

also this to fix the compiler

me-no-dev commented 7 years ago

on another go... lwip does not set the DF flag so I guess fragmentation requests will not be received

marwijn commented 7 years ago

I don't know if this helps, but I've changed in webresponses in the following:

size_t AsyncAbstractResponse::_ack(AsyncWebServerRequest request, size_t len, uint32_t time){ if(!_sourceValid()){ _state = RESPONSE_FAILED; request->client()->close(); return 0; } _ackedLength += len; size_t space = request->client()->space(); space = (space / 1360) \ 1360;

I think at my system 1360 is equal to tcp_mss. Can't this value be used ?

me-no-dev commented 7 years ago

we can not impose such limitation in the library. There must be another way to do things ;) someone might come with mss smaller than that. The "fix" should be universal and should not limit the packet size when the connection can support higher MSS. I can add a setter in the lib to allow limiting the outgoing size but how will that help in reality? Any ESP can end up in that situation at any time. Maybe the problem is actually on the router side? because it should be it's job to fragment and ensure transmission (since we do not set the DoNotFragment flag). I'll do some more thinking :) I hope to figure out a way to test it though... but no such problems in my network

marwijn commented 7 years ago

Have you tried to reproduce by configuring the ESP as access point and connect directly with an android phone (if you have one available).

crashsoft commented 7 years ago

marwijn, yes, i havee tried ESP as access point -- works fine. But i need my ESP to be connected to internet.

tbnobody commented 7 years ago

I think this might be the Same problem? https://github.com/me-no-dev/ESPAsyncWebServer/issues/63

tbnobody commented 7 years ago

Hm what I don't understand is the fact, that it's working pretty nice with the webserver library which is included in the ESP arduino package.

The MTU in my network is 1220 because of an cisco wireless lan controller which tunnels each package from the AP to the controller.

If I enable the debug mode and perform an request using the included webserver library I get the following output:

Request: /css/bootstrap.min.css
 Arguments: 
StaticRequestHandler::handle: request=/css/bootstrap.min.css _uri=/
StaticRequestHandler::handle: path=/css/bootstrap.min.css, isFile=0
:wr
:sent 148
:ww
:wr
:sent 1220
:sent 240
:ww
:wr
:sent 1220
:sent 240
:ww
:wr
:sent 1220
:sent 240
:ww
:wr
:sent 1220
:sent 240

If I perform the same request using the ESPAsyncWebserver library I get the following:

AsyncAbstractResponse::_ack 0 0
AsyncAbstractResponse::_write2 : 2920
AsyncAbstractResponse::_ack 2440 4
AsyncAbstractResponse::_write2 : 5360

I am not very familiar with the deep functionality of TCP/IP but how does the first code immediatly detect the right package size?

me-no-dev commented 7 years ago

To be honest I have no clue! Both TCP libs driving the WebServers are based pretty much on the same exact code. There isn't much to it, It asks LwIP how much it can send and sends that much, counts what has been acked back from the other end and sends more. Nowhere does it limit the size to anything, just whatever the stack on the back allows to do. I have been seeing this issue times and times and have tried what I can remotely, but I have not been able to reproduce it here so I can debug it better :(

tbnobody commented 7 years ago

Hm if I understand the debug output correctly, in the first code I get two ACK packages

:sent 1220
:sent 240

I think that's because the package is been fragmented and the client receives two packages. In the second output I only see one ACK response (with the same size as the first ACK response in the first code).

Had a look at the ClientContext.h file which is included in the ESP for Arduino libraries. This code is executed in the write method:

_size_sent = will_send;
DEBUGV(":wr\r\n");
tcp_output( _pcb );
_send_waiting = true;
delay(5000); // max send timeout
_send_waiting = false;
DEBUGV(":ww\r\n");
return will_send - _size_sent;

And I think there is a callback function which will be executed on every received ACK:

        err_t _sent(tcp_pcb* pcb, uint16_t len) {
            DEBUGV(":sent %d\r\n", len);
            _size_sent -= len;
            if(_size_sent == 0 && _send_waiting) esp_schedule();
            return ERR_OK;
        }

If I interpret the code correct, the data will be written to an output buffer (tcp_output) and than an delay of 5 seconds will be performed to wait for all outstanding ACKs. During this 5 seconds the _sent method will be executed for every received ACK and the _size_sent will be counted down to zero. If it reaches zero, the delay will be canceld (esp_schedule).

In your function:

int8_t AsyncClient::_sent(tcp_pcb* pcb, uint16_t len) {
  _rx_last_packet = millis();
  ASYNC_TCP_DEBUG("_sent: %u\n", len);
  _pcb_busy = false;
  if(_sent_cb)
    _sent_cb(_sent_cb_arg, this, len, (millis() - _pcb_sent_at));
  return ERR_OK;
}

You call the callback on every received ACK package. I am not sure if I see a location where you wait for the second ACK. If I understand the code corretly it should be somewhere in the method size_t AsyncAbstractResponse::_ack. But I am not sure if it's necessary to wait for all ACKs before sending out the next bunch of data.

tbnobody commented 7 years ago

It's working now. But I don't know if this change has any negative side effects (also variable names are just prefixed that I find them easier during debugging):

diff --git a/src/ESPAsyncTCP.cpp b/src/ESPAsyncTCP.cpp
index da55816..31752ef 100644
--- a/src/ESPAsyncTCP.cpp
+++ b/src/ESPAsyncTCP.cpp
@@ -219,10 +219,10 @@ size_t AsyncClient::write(const char* data) {
 }

 size_t AsyncClient::write(const char* data, size_t size, uint8_t apiflags) {
-  size_t will_send = add(data, size, apiflags);
-  if(!will_send || !send())
+  tba_will_send = add(data, size, apiflags);
+  if(!tba_will_send || !send())
     return 0;
-  return will_send;
+  return tba_will_send;
 }

 size_t AsyncClient::add(const char* data, size_t size, uint8_t apiflags) {
@@ -355,7 +355,8 @@ int8_t AsyncClient::_sent(tcp_pcb* pcb, uint16_t len) {
   _rx_last_packet = millis();
   ASYNC_TCP_DEBUG("_sent: %u\n", len);
   _pcb_busy = false;
-  if(_sent_cb)
+  tba_will_send -= len;
+  if(_sent_cb && tba_will_send == 0)
     _sent_cb(_sent_cb_arg, this, len, (millis() - _pcb_sent_at));
   return ERR_OK;
 }
diff --git a/src/ESPAsyncTCP.h b/src/ESPAsyncTCP.h
index 9efdc49..75bcd84 100644
--- a/src/ESPAsyncTCP.h
+++ b/src/ESPAsyncTCP.h
@@ -49,6 +49,10 @@ typedef struct SSL_CTX_ SSL_CTX;
 #endif

 class AsyncClient {
+
+  private:
+    size_t tba_will_send = 0;
+
   protected:
     friend class AsyncTCPbuffer;
     tcp_pcb* _pcb;
tbnobody commented 7 years ago

Hm maybe it's better to place _pcb_busy = false; inside the if clause:

diff --git a/src/ESPAsyncTCP.cpp b/src/ESPAsyncTCP.cpp
index da55816..0a2a7d3 100644
--- a/src/ESPAsyncTCP.cpp
+++ b/src/ESPAsyncTCP.cpp
@@ -219,10 +219,10 @@ size_t AsyncClient::write(const char* data) {
 }

 size_t AsyncClient::write(const char* data, size_t size, uint8_t apiflags) {
-  size_t will_send = add(data, size, apiflags);
-  if(!will_send || !send())
+  tba_will_send = add(data, size, apiflags);
+  if(!tba_will_send || !send())
     return 0;
-  return will_send;
+  return tba_will_send;
 }

 size_t AsyncClient::add(const char* data, size_t size, uint8_t apiflags) {
@@ -354,9 +354,12 @@ void AsyncClient::_ssl_error(int8_t err){
 int8_t AsyncClient::_sent(tcp_pcb* pcb, uint16_t len) {
   _rx_last_packet = millis();
   ASYNC_TCP_DEBUG("_sent: %u\n", len);
-  _pcb_busy = false;
-  if(_sent_cb)
-    _sent_cb(_sent_cb_arg, this, len, (millis() - _pcb_sent_at));
+  tba_will_send -= len;
+  if (tba_will_send == 0) {
+   _pcb_busy = false;
+   if (_sent_cb)
+     _sent_cb(_sent_cb_arg, this, len, (millis() - _pcb_sent_at));
+  }
   return ERR_OK;
 }

diff --git a/src/ESPAsyncTCP.h b/src/ESPAsyncTCP.h
index 9efdc49..75bcd84 100644
--- a/src/ESPAsyncTCP.h
+++ b/src/ESPAsyncTCP.h
@@ -49,6 +49,10 @@ typedef struct SSL_CTX_ SSL_CTX;
 #endif

 class AsyncClient {
+
+  private:
+    size_t tba_will_send = 0;
+
   protected:
     friend class AsyncTCPbuffer;
     tcp_pcb* _pcb;
me-no-dev commented 7 years ago

so you propose to mark the client as busy till all acks come and to track the sent/acked status in the client? What if I call write twice? tba_will_send will be overwritten, then on ack will be deducted the previous send and so on. Still I do not want to impose such limitation on a client level. I have been removing the pcb_busy dependent code through time. Also this in no way will calculate the actual maximum packet size or mimic what is happening in the other log from the regular Server. The real resolution for this problem should be universal for any given max-packet and should not limit functionality and throughput. For example, LwIP should not tell me that I can send 1460 bytes when the other end can not receive them, or should handle sending those bytes in proper packets on it's own. I have read some on the matter and issue seems to be in LwIP itself not being able to properly receive and manage fragmentation errors and also to take into account whet the other end advertises as MSS when they are connecting

tbnobody commented 7 years ago

Hm you are right. It works nice for one request but not for simultaneous requests. I am not that deep in the whole matter. But is it possible to call write twice without receiving the ack package for the same connection? As far as I saw, the write function was called in the webresponse ack method. But that's just a guess, I don't know any possible side effects. You are right LwIP should tell the MTU for the connection (if this is possible). In any other case there will always be a package fragmentation.

me-no-dev commented 7 years ago

packet fragmentation is not the issue as neither side can really know what the MSS is through the connection in between. The limiting router can be somewhere externally for both and both to have full 1460 available locally. Issue is that lwip does not even have the code for handling "packet needed fragmentation" ICMP messages. The only code for ICMP is for ping and I am still not that familiar with all the goes inside. I do know that newer versions of LwIP have that taken care of and I know that we will update ESP8266 to those new versions, but timeframe is sadly not something that I can give. I do notice some commits made to the ESP8266 tree lately though, so it might not be long :)

marwijn commented 7 years ago

I agree with me-no-dev that this is an issue in the LwIP stack. As I wanted to use this with the AsyncWebbrowser I've made the following change in WebResponses.cpp :

size_t AsyncAbstractResponse::_ack(AsyncWebServerRequest *request, size_t len, uint32_t time){
  if(!_sourceValid()){
    _state = RESPONSE_FAILED;
    request->client()->close();
    return 0;
  }
  _ackedLength += len;
 if (_ackedLength < _writtenLength) return 0; // Do not add a new packet till the previous one is fully processed.

This seems to work nice. It is still a workaround, but I guess it's better then nothing.

tbnobody commented 7 years ago

I fully agree with you. It's not nice to reduce the performance of the lib, but slower is better than non working :) Does it maybe make sense to add such a workaround as a define/configuration option? I personally have no problem to patch the library. But the problem seems to occour for more people.

But first of all I am happy that it's working now :)

me-no-dev commented 7 years ago

@marwijn I really like you workaround! That is more of how it should be implemented :) in the upper level. I'm going to shamesly steal it!

me-no-dev commented 7 years ago

@tbnobody could you give that a go and see how it does on your side?

marwijn commented 7 years ago

I've been looking again at tbnobody fix once more since a fix in the asynctcp library would fix it for all clients. Would the following also work: In the write change

tba_will_send = add(data, size, apiflags);

to

tba_will_send += add(data, size, apiflags);

And check at the start of the function whether tba_will_send == 0, and if so return 0;

I don't see the need for the busy flag ? This will have some impact on performance, but at least it would work for all clients.

By the way if you're online now we might use a chatroom to discuss.

me-no-dev commented 7 years ago

yes I am :) you'll come to Gitter or?

me-no-dev commented 7 years ago

YEAH!!! Closing this finally! fixed in latest git

crashsoft commented 7 years ago

Wow, great job! You've fixed it. Now i can switch back to AsyncServer :)