twitter / finatra

Fast, testable, Scala services built on TwitterServer and Finagle
https://twitter.github.io/finatra/
Apache License 2.0
2.27k stars 406 forks source link

File Upload as a Stream #538

Open baughmann opened 4 years ago

baughmann commented 4 years ago

Finatra currently does not support streaming files uploaded via a multipart HTTP request.

Expected behavior

The ability to stream the raw bytes of a file being uploaded using com.twitter.finatra.http.fileupload.FinagleRequestFileUpload

Actual behavior

As per the TODO inside of com.twitter.finatra.http.fileupload.MultipartItem: // TODO don't store file in memory; write to temp file and store file reference.

Currently, the entire file is read into memory inside of parseMultipartItems in FinagleRequestFileUpload

Steps to reproduce the behavior

Self-explanatory.

Attempted resolution(s)

To say I'm a beginner with Scala and Finatra/Finagle would be giving me too much credit... I tried to extend FinagleRequestFileUpload with my own class and create my own version of MultipartItem to get the behaviors that I want, but there must be some magic going on that I'm not aware of.

When I try:

  post("/upload") { request: MyRequestFileUpload =>
   // where `fromRequest()` is my version of `FinagleRequestFileUpload.parseMultipartItems()`
    request.fromRequest(request)
  }

I get the following compile time exception:

Error:(23, 25) type mismatch;
 found   : com.twitter.finatra.http.fileupload.MyRequestFileUpload
 required: com.twitter.finagle.http.Request
    request.fromRequest(request)

If there is any way at all that I can be of assistance, please let me know. I would love to use Finatra, as it's much higher-level than Finagle, but I am required to upload files that could be in excess of several gigabytes and don't really want to do anything too weird.

If you know of a work-around that I could do, please let me know.

Thank you all for the awesome library!

EDIT:

After some more digging, I found out that my implementation should be:

  post("/upload") { request: Request =>
   // where `fromRequest()` is my version of `FinagleRequestFileUpload.parseMultipartItems()`
    val map: Map[String, MyMultipartItem] = new MyRequestFileUpload().fromRequest(request)
  }

Now, the hard part... how to properly turn this into a stream 🙃

EDIT 2: Maybe best to use this Netty 4 example for inspiration

cacoco commented 4 years ago

@baughmann thanks for the issue. Have you taken a look at the Streaming documentation? https://twitter.github.io/finatra/user-guide/http/streaming.html

We can probably do some more work to document and tie things together, but I believe you just want a Reader[Byte] or use a StreamingRequest.

baughmann commented 4 years ago

@cacoco Thanks for your reply.

Yes I did, and in fact I had first attempted that after seeing the comment I mentioned in the source. I was unable to successfully implement it as expected, probably due the lack of familiarity I highlighted in my OP.

Yes, more documentation, or even just a brief and incomplete example, would certainly be appreciated. Do you have anything that might be able to serve as such an example?

EDIT:

By "as expected" I mean:

  post("/upload") {reader: Reader[Buf] => {
    reader.map(toString)
  }}

...yields the following for a POST containing two text files:

["----------------------------712513722326590875981925\r\nContent-Disposition: form-data; name=\"file\"; filename=\"testdata.txt\"\r\nContent-Type: text/plain\r\n\r\nThe is my data!\r\n----------------------------712513722326590875981925\r\nContent-Disposition: form-data; name=\"file\"; filename=\"testdata2.txt\"\r\nContent-Type: text/plain\r\n\r\nAnother data!\r\n----------------------------712513722326590875981925--\r\n"]

While it is trivial to write something to parse these two entries from the reader, I had assumed that a framework like Finatra might not intend me to process these files like this?

yufangong commented 4 years ago

Hi @baughmann , it seems finatra currently doesn't have the multipart requests supporting streaming. Right now the parseMultipartItems only parse the fully buffered requests. I think we might be able to let parseMultipartItems parse streaming requests as well, when request.isChunked, deal with request.reader. Let us know if you want to look into this support.

baughmann commented 4 years ago

@yufangong I sure can! I had actually been playing with that idea of taking some features from this Netty 4 example.

I cannot promise that it will be quick, but it seems as though demand for this is limited for the time being. I'll post back here when I have implemented something.

baughmann commented 4 years ago

It looks like it's gonna be a while longer. I still have the original need outlined in this post, but it's kinda gotten pushed down the priority list.