Question about how to read part contents in a non-blocking way

sdeleuze commented 8 years ago

Hi,

I am working on the Reactive support of the upcoming Spring Framework 5 release, and have created a prototype of Reactive Multipart support based on your NIO Multipart library, see this pull request for more details.

I have some questions related to how to perform end to end non-blocking processing of the multipart requests. Thanks to your documentation, I have been able to implement non-blocking parsing of the request (see NioMultipartResolver) but the content of each part seems to be only available as String for non file parts and via an InputStream for file parts, so I am not sure how to read this content in a non-blocking way. Is it possible with the current API?

Melozzola commented 8 years ago

Hi Sébastien, The library supports end to end non-blocking processing and the key extension point is the PartBodyStreamStorageFactory. The PartBodyStreamStorageFactory is responsible to provide the parser a StreamStorage which is where the parser writes the bytes of a part body as they come in. By default the parser uses the DefaultPartBodyStreamStorageFactory which is storing the data in a temp file, but you can provide your own implementation. To give you an example a PartBodyStreamStorageFactory could stream the data directly into Amazon S3 (or any object storage) or stream directly into a database. I had a look at your NioPart and I think it could extend the StreamStorage and you could have a custom PartBodyStreamStorageFactory that serves NioParts. At that point the NIO Multipart Parser will write directly into the NioPart in a non blocking mode and you don't need to read back the data from the InputStream. There is an example of how to extend the NIO Multipart Parser with a custom PartBodyStreamStorageFactory in the documentation (See section "A powerful extension point").

The NIO Multipart Parser is not using the PartBodyStreamStorageFactory and the StreamStorage for form parameters. Form parameters are small pieces of data stored as a parts and the parser is returning the parameter value directly as a String via the onFormFieldPartFinished callback. Having a non-blocking end to end processing doesn't make too much sense for form parameters.

I hope my explanation is clear and do not hesitate to contact us again if you need more help.

sdeleuze commented 8 years ago

Hi @Melozzola, and thanks for your detailed answer.

So let's say I implement a custom PartBodyStreamStorageFactory, I think my main concern is related to the fact that StreamStorage is based on plain InputStream and OutputStream, which allow indeed some streaming use cases but are by design blocking.

As shown is your examples, Servlet 3.1 provides additional methods like setReadListener(), isReady() or isFinished() methods to make ServletInputStream usable in a non-blocking way, but on PartBodyStreamStorageFactory side, StreamStorage is just using raw InputStream and OutputStream, so I don't understand how to provide a non-blocking behaviour using it.

Any thoughts?

sdeleuze commented 8 years ago

Could we maybe imagine a ReactiveStreamStorage interface based on Reactive Streams and provinding a Reactor based implementation, or make StreamStorage using some NioInputStream and NioOutputStream that would provide additional NIO oriented methods like ServletInputStream and ServletOutputStream do? Should I create an issue for that in the nio-stream-storage issue tracker?

sdeleuze commented 8 years ago

From our experience Servlet 3.1 NIO approach is very difficult to use and implement, so I would really favor a Reactive Streams based alternative, if you are open to discuss it. Maybe we could collaborate on that.

Melozzola commented 8 years ago

Hi Sébastien, First of all, apologises for the delay. I've been thinking at your request, I just didn't have time to answer. Starting from your second comment (on 18th of October) I got your point, being the NIOMultipartParser and the StreamStorage OutputStreams, the write operation is not NIO. The worker thread that the app server assigned for the processing of the data will block until the parser wrote that data into the StreamStorage. This is probably OK for most of the cases, where the bottleneck is the upload connection. Moreover, the NIO Multipart parser has internal (configurable) buffers and the flush to the StreamStorage is not happening every time, but just when the buffers are full.

Anyway, making it write NIO would require some thought and some refactoring. In particular there should be a mechanism to apply backpressure if the parser cannot parse and write into the StreamStorage fast enough. For now that mechanism is implicitly achieved by blocking the worker thread (waiting for the data to be written to the StreamStorage). Even if it might not be ideal, this simplified the design for the first version.

I think there could be various solutions for making the write phase NIO or asynchronous and something along the line of Reactive Streams is one possibility. We don't necessarily have make the StreamStorage a ReactiveStream, maybe we can just write an adapter that executes the writes into the NIOMultipartParser asynchronously. The adapter will then notify a listener when the write succeeded or failed. Do you know if the ReactiveStreams would have advantages over an Adapter?

I'm open to suggestion and in the next days I will try to learn a bit more about the ReactiveStreams

sdeleuze commented 8 years ago

Reactive Stream is very well suited for that kind of IO use cases, it is a small set of 4 interfaces designed to manage backpressure and it is a kind of corner stone for non-blocking interoperability since such type adapt (not need for conversion, this is really just about adapting from one Reactive type to another with very little overhead as elements are streamed) from/to RxJava, Reactor, Akka, Java 9 Flow.Publisher ...

That said, I think there is 3 important points to keep in mind.

The first point, and the more important, is that you can create a blocking implementation based on a non-blocking one but not the other way around, so from my POV the main challenge is using NIO (eventually Reactive) types internally, and optionally adapt to InputStream / OutputStream for users that are fine with that, not just adapting current implementation to another interface. That means that most of the internals of your lib may have to evolve to be really end to end non blocking (using Publisher instead of InputStream / OutputStream).

The second one is that while RS interfaces are very simple, every Publisher implementation has to comply with the RS TCK, and that's VERY difficult if you implement it yourself. That's why almost all Reactive librairies has chosen a RS implementation (that can easily be adapted to another, Reactor 3 is well suited for that use case ^^). Since Publisher carries the semantic but is not really useful for an end-user, we usually return Reactor Flux (that implements Publisher) and use raw Publisher as input parameter.

The third one is that you have to choose a byte buffer type. Maybe Publisher<ByteBuffer> in your case ?

As an example, you can have a look to our blocking HttpMessageConverter based on InputStream / OutputStream that have been transformed in Spring Web Reactive to Reactive Streams based APIs using Publisher as an input parameter or a return value, see HttpMessageWriter and HttpMessageReader.

You can also have a look to RequestBodyPublisher to have an example about how we translate Servlet 3.1 semantics to Reactive Streams Publisher.

Servlet 3.1 tried to adapt InputStream / OutputStream based API to something non-blocking. It is doable, but quite difficult implement, do not natively support backpressure, and imply a lot of constraints when you use it while RS easily adapt to a wide range of other types.

Feel free also to have a look to this blog post I wrote recently : Understanding Reactive Types.

Any thoughts about what that could means for NIO Multipart?

Melozzola commented 8 years ago

Hi Sébastien, First of all thank you for the explanations above, I did a bit of reading and I will do more in the next period. One thing I was afraid of is the integration of the NIOMultipartParser with the Servlet 3.1 ReadListener. Currently it is straightforward, but if the parser becomes a Subscriber (or a Processor) is not trivial anymore. Anyway, the RequestBodyPublisher you mentioned above could be the missing piece that could make the integration easier. I started a PoC in a separate branch called reactive-stream (will push it soon) to see if it is possible to adapt the NioMultipartParser and the other components to work in a reactive way (compliant with the Publisher and Subscriber APIs). Based on the outcome of the PoC, there will be a decision if making the NioMultipartParser reactive by default.

The high level view of what I'm trying to achieve is the following:

Create an adapter NioMultipartParserProcessor that wraps the NioMultipartParser and it implements the Processor interface. The Subscriber part of the Processor will consume the raw request data (probably published by the RequestBodyPublisher) and write it asynchronously to the parser. Every time the write is completed, it will signal the RequestBodyPublisher to send more data. The Publisher part of the NioMultipartParserProcessor will produce parsed data for the StreamStorage (or for it's adapters).
Create adapters for the PartBodyStreamStorageFactory and StreamStorage that implements the Subscriber api. The adapters will pull the parsed data from the NioMultipartParserProcessor and write asynchronously to the StreamStorages.

In doing this the library should become completely NIO Write as well. Feel free to monitor the branch, give inputs or even contribute.

sdeleuze commented 8 years ago

Awesome, I will have a look shortly and send you some feedbacks. Thanks!

alowrey commented 6 years ago

@sdeleuze @Melozzola Is there any update on this issue? This is a problem for an app I'm working on and I'm looking for a solution/workaround.

Assuming this issue won't be finished soon, is there an example workaround that could be used that while blocking, at least allows DataBuffers to be pushed periodically instead of waiting for the entire part to finish?

Thanks!

defnull commented 6 years ago

I solved this issue as described in #8 and it works great for me, parsing multiple GB of uploads a day. I would be willing to clean up my code a bit and publish it as MIT, but its only a workaround for nio-multiparts blocking APIs. I'd love to see the event-stream based parser architecture (instead of callbacks) implemented natively in this library, as the current API can be implemented easily on top of an event stream, but it's hard to do the other way around.

alowrey commented 6 years ago

Ok great, I’ll take a look. If you could publish something as MIT that’d be extremely helpful! Appreciate the offer. A workaround will do just fine until there’s native support in the library.

defnull commented 6 years ago

Would you be willing to contribute? For example, tests? I have tests for my entire rest framework stack that also cover the multipart parser, but these tests have a ton of dependencies on internal utilities and are hard to extract into a separate project.

I also never published anything to maven central. No idea how much work that is. I think an event-stream based (non-blocking) multipart parser is useful for a lot of people, and it might be worth it.

synchronoss / nio-multipart

Question about how to read part contents in a non-blocking way #4