bufferedconnection:send() should have fragmentation logic for large payloads, and pre-concats should be avoided

devyte commented 8 years ago

Hi,

I've commented on this in the relevant commits, but I thought I'd better open an issue. The ESP currently has an issue with sending payloads over a socket that are larger than one MTU, minus some overhead bytes. If this is done, there is a chance that the ESP will become unstable and could reboot. There is a guideline (I forget where I read it) to keep payloads below about 1400 in size. The buffered connection implementation refactored in this commit currently has 2 thresholds: 800 and 1000. I think these were meant as conservative margins, and they do work as long as you keep buffered sends small. The current implementation accumulates payload in a table, then once a threshold is reached, it concatenates and sends. The use of a table and a late concat is an efficient way to handle this. The implicit assumption is that the total payload size will still be less than one MTU. However, there is no provision for the case where a single payload is very large, or where the payload plus what is already accumulated in the table is greater than one MTU (i.e.: 1400). There needs to be some fragmentation logic implemented to make sure that the socket:send() calls are always done with payloads smaller than 1400. With this, buffered:send() can be called in any way the user wants as long as memory hold out. In addition, this commit made changes to prematurely concatenate payloads, and then send the large results to buffered:send(). The premature concatenations are done in the "bad" way, by using the ".." operator, which is known to have performance issues because it trashes the GC, especially when used in a tight loop. These concats are not needed, because the buffered connection already handles it, and in a more efficient way. I suspect that there was confusion between the socket:send() and the buffered:send() (at one point I made the same mistake). It is true that the new firmware can only handle one socket:send in each callback, and so if the socket were used as connection, then these concats would be needed. However, the -static, -error, -header, etc files receive as argument a buffered connection, and so the concats should not be done, because they make the previous point worse.

marcoskirsch commented 8 years ago

I refactored the BufferedConnection and moved it to its own file last night: https://github.com/marcoskirsch/nodemcu-httpserver/blob/master/httpserver-connection.lua

That should simplify further modifications. Now, to your points:

The implicit assumption is that the total payload size will still be less than one MTU. However, there is no provision for the case where a single payload is very large, or where the payload plus what is already accumulated in the table is greater than one MTU (i.e.: 1400).

I agree, we could modify the class so that payloads bigger than ~1400 bytes are broken up and sent. This is not a difficult improvement. Let's call this issue 1.

In addition, this commit made changes to prematurely concatenate payloads, and then send the large results to buffered:send(). The premature concatenations are done in the "bad" way, by using the ".." operator, which is known to have performance issues because it trashes the GC, especially when used in a tight loop. These concats are not needed, because the buffered connection already handles it, and in a more efficient way.

Fair enough. Let's call this issue 2.

However, the -static, -error, -header, etc files receive as argument a buffered connection, and so the concats should not be done, because they make the previous point worse.

I am not sure what you mean. Can you clarify?

Ok, let me tell you my thoughts:

When I started working on this project, back in nodemcu-firmware 0.9.4 or something like that, memory was really constrained. I had to make things work with 3-4 KB of free heap. Things have improved but it still feels funny to have this object keep around ~1KB of data.

Perhaps we should swing the pendulum all the way back, and simplify the BufferedConnection class with a much much simpler class that just sends the payload immediately and calls coroutine.yield() so that clients don't have to. We can handle breaking up big payloads as that doesn't use memory. But we no longer keep data around, concatenate, and defer sends until chunks are big enough.

This would reduce memory usage and simplify the code at the expense of worse performance due to now having a lot more small connection:sends(), coroutine:yield(), onSent callback, coroutine:resume() loops.

What do you think?

Less desirable is to provide both BufferedConnections for people who care about speed and not memory, and non-BufferedConnection as I describe for people who care about memory more than speed. Make it selectable via configuration. But given that we have no automated tests I'd rather reduce code paths and keep things simpler.

Philip Gladstone @pjsg originally came up with this code so I'd love to hear his thoughts.

devyte commented 8 years ago

I understand about the memory limitations, I have an embedded systems background myself, and I've worked with uCs even smaller than this one.

However, the -static, -error, -header, etc files receive as argument a buffered connection, and so the concats should not be done, because they make the previous point worse.

I am not sure what you mean. Can you clarify?

The code sequence is more or less like this example: fileServeFunction = dofile("httpserver-error.lc") startServing(fileServeFunction, connection, ...)

within startServing: coroutine.create( function(fileServeFunction, bufferedconn, ...) fileServeFunction(bufferedconn, ...) ... end)

So the dofile("script") returns a function, which is eventually passed to the coroutine, which in turn calls it passing a buffered connection as argument, and not the socket.

Therefore, httpserver-static/error/header and the example scripts don't need to worry about the single send() limitation of the newer firmware, because when they call conn:send(), conn is not a socket, but a buffered connection.

So concatenating before calling conn:send() is not needed. In fact, it makes things a lot worse for two reasons: -GC trashing, because every time the .. op is used, a new string gets created and the old one becomes garbage (issue #2) -It sends potentially huge payloads to the buffered connection, and if the payload size is greater than one MTU, then the ESP could crash in spite of the buffered connection (issue #1)

About your idea, I like the current buffered connection approach, and I like the fact that you're refactoring it into its own file. I think it needs a bit more work, because its use depends on the onSent callback code of the socket, and on the coroutine handling code, but I was thinking along similar lines myself. The simplification you describe makes my developer instinct twitch in a bad way. I used a somewhat similar approach in a lua server before the current "telnet" example was implemented, and performance was horrible. Printing a simple list of 10 lines would take several seconds, which is crazy. However, that doesn't mean the idea is bad or wrong. I would suggest implementing exactly what you said is less desirable: have both buffered and yielded connections, and let's compare and test them. Let's see what the mem footprint looks like, and what the performance looks like in each case. Then, we can decide which one to keep. On a side note, I have as a side development a derivative of the httpserver, which decouples a bunch of things (compiling, wifi setup and configuration, tcp server from webserver, etc). Please take a look at nodemcu-platform. My goal is to reach a platform that runs on top of the firmware that is not limited to just a webserver, and which allows a user to focus on just his/her application. I even made a webgui for the wifi setup. I intend to add websockets as another module of the tcpserver. If you or @pjsg think it makes sense, perhaps we should unify the code.

marcoskirsch commented 8 years ago

I think it needs a bit more work, because its use depends on the onSent callback code of the socket, and on the coroutine handling code, but I was thinking along similar lines myself.

100% agreed. Also the startServing function needs to know to flush the connection. It's odd.

It seems you like buffering, as the performance is better. Let's improve httpserver-connection.lua code to address some of the concerns.

I'll check out your project later. Thanks for the feedback!

pjsg commented 8 years ago

It turns out that there is a bug in my original code -- in the 'l > 768' case, you need to flush the pending data first and then send the payload....

It isn't a general solution to the problem as it requires the specific coroutine infastructure. There is another approach -- see this thread: https://github.com/nodemcu/nodemcu-firmware/pull/993 -- in particular a comment by TerryE where he does something somewhat different that doesn't need the coroutine infrastructure. The cost is that all the data to be sent gets buffered up.....

It does appear that the combining of data blocks has significant performance advantages, so it is important to keep that aspect.

devyte commented 8 years ago

@pjsg The approach proposed there is a straight up fifo implemented in Lua. Like you said, all data to send will get buffered up until control is returned to the SDK, which will at some point trip users up in a similar way as it does now. Said in a different way, it only mitigates the issue, but doesn't fix it. In constrast, the buffered connection with the coroutine doesn't suffer from that problem, because data is sent (sent as in really sent) on threshold, so issuing many send()s is not a problem. The only issue I find with the buffered connection is that it's a pain to integrate into any project. @marcoskirsch has taken a step in the right direction by factorizing the buffered connection out into its own file, but it still needs more work. Let's keep things separate. Let's keep this issue open to roll back the concats in -static/-error/-header, and to implement the fragmentation logic. I'll open a new issue to improve usability of the buffered connection, and maybe improve code factorization.

pjsg commented 8 years ago

For the purposes of nodemcu-httpserver, I think that the bufferedconnection approach is the best way as it allows simple code to work correctly and not blow up. It isn't a general solution as it needs the specific coroutine interactions.

devyte commented 8 years ago

@pjsg I completely agree, it is the best approach, and exactly my point, it needs more work. If it can be generalized and refactorized, it would be the best available solution for the foreseeable future. I have opened #52 to cover that.

devyte commented 8 years ago

@marcoskirsch I have an implementation of the fragmentation logic. It seems to be working in my local testing, but I haven't been exhaustive with it. I'll open a PR as soon as I refork the repo and put it in there.

However, there's a problem. The httpserver can handle one request, but the ESP crashes when a 2nd request comes in before the 1st is finished, because of out of mem panic. This seems to be because the ESP accepts multiple connections on the server (up to 5 it seems), and browsers start creating additional connections for html elements. In this case, the index.html and the underconstruction.gif files are being requested together. The additional connection doesn't always come in before the 1st connection is done, it's a race condition. Therefore, the httpserver is currently not very stable, and in order to make it stable, the mem footprint would have to be reduced to something like 1/3rd of what it is today...

marcoskirsch commented 8 years ago

I haven't seen the problem you mention in the current code base. Do you know why? I can try and make it happen - I used to have an example that was a page and served a bunch of images. I can try that again.

marcoskirsch commented 8 years ago

Opened #53 which promptly crashes.

marcoskirsch commented 8 years ago

Closing, I consider this fixed by commits 3dd98667dd502ac80d0d5c221a2ea6e36bab6b34 1b4a2c9d9ab308f89c75b9b8304e3511f8cb1680

Thanks!

marcoskirsch / nodemcu-httpserver

bufferedconnection:send() should have fragmentation logic for large payloads, and pre-concats should be avoided #51