yahoo / elide

Elide is a Java library that lets you stand up a GraphQL/JSON-API web service with minimal effort.
https://elide.io
Other
1k stars 227 forks source link

Draft: File Upload Design #203

Closed xiaoyao1991 closed 3 years ago

xiaoyao1991 commented 8 years ago

I'm trying to draft the design of supporting file upload with metadata in Elide.

Motivation

Use cases(1, 2) are common of building a REST JSON API that allows for files to be uploaded with json documents describing the metadata.

An example is in an E-Commerce site, sellers may want to upload images for their products, and specify which products the images are related to.

Approaches

Base64 Way

The client encodes the files to be upload with Base64, and embed the Base64 string as a field in the JSON document.

The server receives the JSON document with Base64 data embedded, and applies additional processing logic (decoding from Base64, image resizing, saving to local filesystem, etc.)

The client sends a multipart/form-data or a multipart/mixed HTTP request instead of application/vnd.api+json. Have a part in the payload as the metadata part with Content-Type: application/vnd.api+json. The rest of the payload can hold the file data.

The server has to have endpoints configured to recognize multipart/form-data or multipart/mixed request type and parse the payload. This could be troublesome if it's a sub-resource (POST /product/1/images).

An example multipart request looks like:

POST /upload HTTP/1.1
Accept: application/vnd.api+json
Content-Type: multipart/form-data;boundary=Boundary_1_208866101_1460195430845
User-Agent: Jersey/2.22.1 (HttpUrlConnection 1.8.0_60)
MIME-Version: 1.0
Host: localhost:8080
Connection: keep-alive
Content-Length: 287451

--Boundary_1_208866101_1460195430845
Content-Type: application/vnd.api+json
Content-Disposition: form-data; name="data"

{
  "data": {
    "type": "image",
    "attributes": {
      "isThumbnail": true
    }, 
    "relationships": {
      "product": {
        "data": {
          "type": "product",
          "id": "1"
        }
      }
    }
  }
}
--Boundary_1_208866101_1460195430845--
Content-Type: application/octet-stream
Content-Disposition: form-data; name="files"

...some binary data...
--Boundary_1_208866101_1460195430845

Inspired by Twitter API ,Youtube API, and this post. The idea is to sort of transactionize the process of file upload using in multiple HTTP calls. The procedure goes as follows:

  1. The client sends an initial JSON-API HTTP request, to provide the metadata of the files to be upload as well as a X-Upload-Content-Length header field to indicate the total accounted size of the content.
  2. The server receives the initial call, processes the metadata(save to database, add to job queue, etc.), marks the metadata record as unfinished, assigns a transaction id, comes up with a timeout, and responds to client with a Location header which contains a tailored URL to the file upload endpoint
  3. The client receives the response, and sends the file to the file upload endpoint.
  4. The server receives the file upload request and processes it. Response will be the same as the response to the initial call.
  5. The client sends an finialize request, to mark that file upload is complete.
  6. The server receives the finalize request, and close the network transaction
  7. If the upload processes still haven't finished after the timeout, the server close the network transaction, marks the metadata record as aborted. Responds to further upload requests with reject messages.
    • Advantages:
    • Can decouple the pure JSON-API endpoints and file upload endpoints. Some application may have dedicated servers with optimized file systems, or use third-party file upload services.
    • Can extend to implement resumeable upload. The client and server can agree on the max size in one HTTP request, and the client splits the file and sends separately.
    • Server can reject or close the file upload process in cases of oversize or timeout.
    • Disadvantages:
    • Multiple HTTP calls.

      Handling Large Files (Chunking Mode)

In all case described above, the client can send the request with Transfer-Encoding: chunked to enable streaming mode.

TODO:

Come up with an Elide change proposal of implementing some or all of the scenarios above.

DennisMcWherter commented 8 years ago

Good overview. Base64 is already implemented by virtue of being able to store text in the database. The setField() method can support any additional processing. Both multipart and multi-phase seem useful. I suggest we look (long-term) to support all methods of file uploading so the developer can choose the particular use-case that suits their needs.

xiaoyao1991 commented 8 years ago

@DeathByTape I agree so. I'm currently working on a proposal regarding necessary changes in elide to support multipart this week. I will think through more on multi-phase and maybe start with a couple of example queries first

beauby commented 7 years ago

Just dropping my 2 cents here: multiple HTTP requests does not seem that big a disadvantage when uploading files, as most of the time will be spent during the actual upload. The only exception would be a high-bandwidth but high-latency network, but it's not that common.

clayreimann commented 7 years ago

I think that file storage is a problem that Elide is not intended to solve, so fwiw I think that the only upload method that Elide should support is the Multiphase strategy. In that scenario the separate service would be responsible for creating/updating the metadata stored in Elide–so the flow would looks something like:

  1. The client initiates an upload to the file storage service.
  2. The storage service calls into Elide to to create some metadata object
  3. The upload proceeds, finishes
  4. The upload service updates the metadata in Elide when the upload is complete
DennisMcWherter commented 7 years ago

I think the real downside for multiple HTTP calls is not about performance, but consistency. Specifically, with multiple calls you don't get any transaction guarantees (like you would from, the JSON Patch ext, for instance). The caller would have to be responsible for ensuring metadata and file data is properly stored. If one or the other fails, they would need to manage the service appropriately from the client-side.

I suppose such a situation could lead to scenarios where malicious clients do horrible things to you data model but, perhaps, your permission model should gate this sort of abuse.

thaingo commented 4 years ago

May I know if supporting file upload is still in Elide roadmap?

aklish commented 3 years ago

It is not on the roadmap.