synchronoss / nio-multipart

A multipart library that uses nio and asynchronous processing
Apache License 2.0
93 stars 20 forks source link

Thoughts about an alternative fully non-blocking API. #8

Open defnull opened 7 years ago

defnull commented 7 years ago

Hi there,

Disclaimer: This is NOT a proposal to change nio-multipart API. I just want to share some insights and perhaps start a discussion or inspire some new ideas.

So, in a project of mine I had to do a lot of work per form field. Some were large uploads, some were expensive computations on these uploads, sometimes involving async IO on server side (e.g. fetch stuff from external URLs). One of the requirements was that form uploads should be aborted as soon as an error is detected to prevent wasted bandwidth and work on client side. The sequence of 'commands' was also relevant. Multiple form fields in a single multi-part POST request were required to be processed in sequence. Also, I was expecting a LOT of concurrent (because long-running and large) requests.

nio-multipart is great in combination with Servlet 3.1 async features (on top of which I built a small helper layer to make the API less error prone), but I found it unfortunate that nio-multipart is callback-based and these callbacks are called (including StreamStorage.write()) in a blocking way. Or in other words: Each call to NioMultipartParser.write() may trigger multiple calls to various callbacks and it blocks if any of them does any blocking work.

This was a problem. If I want to use non-blocking APIs from within these callbacks (e.g. netty or async-http-client or anything that returns CompletableFuture), there is no built-in way to pause the parser and prevent it from emitting more events (and thus causing out-of-order bugs or race conditions). I can stop the byte input to the parser, but there might be still multiple form fields within a single chunk of bytes and the parser wants to emit all of these in bulk. I have to either block in a callback, or not do any work in them but put jobs in a queue instead. This worked, but was kind of awkward given the large number of methods and wrapper classes needed for this. The API surface of nio-multipart is non-trivial.

Thinking about this, I have built a small wrapper around nio-multipart that allowed me to use it in a different way, more like an event-stream.

In summary: MyParser.write(bytes...) returns an array of zero or more Part instances. These hold the headers and binary content of the multipart-part, all in memory. The last Part per call may be incomplete if its internal buffer grows larger than a configurable threshold. Incomplete parts are returned repeatedly for each call to MyParser.write(bytes...) until they are complete. The event consumer should drain as much as possible from incomplete parts (e.g. into temporary files or directly into the target location) to free up memory.

This pattern has some benefits: Calls to Parser.write() never block, and the returned parts can be processed in order, out of order, in a thread-pool, in an event loop, whatever you like.

In pseudo-code (no error handling), my consumer looks like this:

void onRead(AsyncContext ctx, ByteBuffer bytes, Throwable err) {
    List<Part> parts = this.parser.parse(bytes);
    CompletableFuture<Void> work = processPartsAsynchronously(parts)
    work.thenRun(()->{
        ctx.readAsync(this::onRead); // schedule next read from client socket
    });
}

There is not a single blocking call in any of this, and back-pressure is no longer an issue.

It is very easy to simulate the original nio-multipart API with this event based approach, so I'd consider it to be 'lower-level'. But in fact, for real non-blocking processing, it might be better suited after all. Netty uses a very similar approach.

So, is there any chance nio-multipart could move to something like this internally, and some day expose it as as usable alternative to the callback-based API? Is nio-multipart built in a way that would allow this?

tl;dr; read the paragraph starting with "In summary:"