Open aengelberg opened 7 years ago
Hi Alex. Thanks for creating this issue. I can see that PartConsumer is not well designed for your use-case.
Having given it some thought, I think one strategy might be to copy multipart.clj
's process-request-body' function to a new namespace and work on a new/alternative implementation based on your requirements. Most of the 'hard' work is in
parse-multipartwhich you can require in -- the current
process-request-bodyimplementation mostly applies a transducer to the parts and reduces them, which is where the protocol comes into play. It shouldn't be too hard to experiment with different designs and since a multimethod is used for
process-request-body`, you should be able to switch easily to your own 'fork' of it and customise it as you wish. (I'd even be happy to accept this improved alternative as a contribution and encourage people to move over to it. The API to the existing multipart feature isn't ideal and could use a redesign anyway.)
The rationale behind this suggestion is that it's not easy for me to know who is dependent on this code and whether we might be breaking binary compatibility by changing either the signature or semantics of PartConsumer. As a rule-of-thumb in Clojure, I believe it's best to leave working code as is and write new code, rather than tinker/tamper with old code. For that reason I'm moving to the concept of multiple APIs - preserving compatibility for existing users while providing improved designs via new namespaces - so yada.yada
is the 'old' API, yada.yada.lean
is on the way and they'll be a brand new one for a new yada API based on clojure.spec.
Happy to leave this issue open for discussion.
My use case is roughly as follows: I have a resource that needs to take a multipart form data with some file metadata as the first few parts, and a large file in the final part. It takes the metadata and the file data and creates an object in S3.
When I start consuming the data of the
file
part, I want to be streaming it to S3 in real time, so I don't have to hold onto any of the data.However, in the
reduce-piece
code, thestart-partial
method of the PartConsumer is being called without thestate
.This abstraction works for building up parts in memory (the default), or buffering the parts to disk, because those can be done without any context about the request. In my streaming scenario, when I'm done processing the small parts, I want to open a data stream to S3, hold onto it, and send all future pieces along that stream as they come in. The "holding onto it" part is what's difficult.
I can imagine a few potential tweaks to the multipart implementation, any of which I believe would make this work for me:
start-partial
take the state:part-consumer
such that a newPartConsumer
object is created for each request (e.g.:part-consumer
is now a function that returns a PartConsumer). That way I can give the record some mutable state upon creation.PartConsumer
's methods, so that each of its methods also returns itself with potentially updated local state.