juxt / yada

A powerful Clojure web library, full HTTP, full async - see https://juxt.pro/yada/index.html
MIT License
735 stars 98 forks source link

Custom PartConsumer impls not given enough context to stream multipart data #144

Open aengelberg opened 7 years ago

aengelberg commented 7 years ago

My use case is roughly as follows: I have a resource that needs to take a multipart form data with some file metadata as the first few parts, and a large file in the final part. It takes the metadata and the file data and creates an object in S3.

curl -X POST -F "some-info=foobar" -F "file=@mylargefile.txt" myapi.com/upload

When I start consuming the data of the file part, I want to be streaming it to S3 in real time, so I don't have to hold onto any of the data.

However, in the reduce-piece code, the start-partial method of the PartConsumer is being called without the state.

This abstraction works for building up parts in memory (the default), or buffering the parts to disk, because those can be done without any context about the request. In my streaming scenario, when I'm done processing the small parts, I want to open a data stream to S3, hold onto it, and send all future pieces along that stream as they come in. The "holding onto it" part is what's difficult.

I can imagine a few potential tweaks to the multipart implementation, any of which I believe would make this work for me:

malcolmsparks commented 7 years ago

Hi Alex. Thanks for creating this issue. I can see that PartConsumer is not well designed for your use-case.

Having given it some thought, I think one strategy might be to copy multipart.clj's process-request-body' function to a new namespace and work on a new/alternative implementation based on your requirements. Most of the 'hard' work is inparse-multipartwhich you can require in -- the currentprocess-request-bodyimplementation mostly applies a transducer to the parts and reduces them, which is where the protocol comes into play. It shouldn't be too hard to experiment with different designs and since a multimethod is used forprocess-request-body`, you should be able to switch easily to your own 'fork' of it and customise it as you wish. (I'd even be happy to accept this improved alternative as a contribution and encourage people to move over to it. The API to the existing multipart feature isn't ideal and could use a redesign anyway.)

The rationale behind this suggestion is that it's not easy for me to know who is dependent on this code and whether we might be breaking binary compatibility by changing either the signature or semantics of PartConsumer. As a rule-of-thumb in Clojure, I believe it's best to leave working code as is and write new code, rather than tinker/tamper with old code. For that reason I'm moving to the concept of multiple APIs - preserving compatibility for existing users while providing improved designs via new namespaces - so yada.yada is the 'old' API, yada.yada.lean is on the way and they'll be a brand new one for a new yada API based on clojure.spec.

Happy to leave this issue open for discussion.