Provide: multipart/form-data POSTs

omani commented 7 years ago

hi,

I am writing a mongo layer for gridfs and I wonder if it is possible to have multiform-data requests for writing to a file (with a POST /resourcename/item).

Can we have an option to accept content types other than application/json?

are there any plans to implement this?

omani commented 7 years ago

I think the whole "check for application/json header" is in util.go:

// decodePayload decodes the payload from the provided request
func decodePayload(r *http.Request, payload *map[string]interface{}) *Error {
    // Check content-type, if not specified, assume it's JSON and fail later
    if ct := r.Header.Get("Content-Type"); ct != "" && strings.TrimSpace(strings.SplitN(ct, ";", 2)[0]) != "application/json" {
        return &Error{501, fmt.Sprintf("Invalid Content-Type header: `%s' not supported", ct), nil}
    }
    decoder := json.NewDecoder(r.Body)
    defer r.Body.Close()
    if err := decoder.Decode(payload); err != nil {
        return &Error{400, fmt.Sprintf("Malformed body: %v", err), nil}
    }
    return nil
}

if we could have a variable option check here we could pass in different kind of accepted content-types. if the content-type is not json, don't try to decode the payload but rather pass it along to the handler. something like that.

smyrman commented 7 years ago

Specifically for your request, I believe @rs have stated earlier that he believes REST APIs are most useful for metadata. @rs sorry if I am paraphrasing this.

Please note that my knowledge is limited here, and don't take my word for the following. But for storing actual data (i.e. a gridfs file), you might want to do so with custom https routes, although you could link to the unique "file" ID in your rest-layer resource with an appropriate field validatior.

Not for your use case in particular, but on a general basis I actually think it would be an interesting change to support registration of encoders/decoders for the API payload that would be "auto-selected" based on (negotiated) Content-Type. That would really only be interesting thought when the payload is to be validated by a schema. That said, it's not something I have the time to contribute at the moment.

omani commented 7 years ago

@smyrman I've already implemented a complete gridfs with custom routes but now I thought why not give rest-layer a try and implement the routes as a rest-layer-mongo-gridfs Storer.

and after several thoughts about this topic I've come to the conclusion that everything would work except POSTing (writing) to gridfiles. because of the way you suppose to upload files (from eg. curl, postman, or blobs in the frontend, or even html5 file uploads). it all ends up in a request with the content-type of multipart/form-data.

omani commented 7 years ago

any other interaction with gridfs would already be a fetching or updating metadata (rest-layer resources) anyway. so it basically is only the issue with writing to gridfiles.

omani commented 7 years ago

and let's remember that python eve does this as well. maybe it's a more specific implementation in python eve because it is based on mongodb strictly (other db handlers are more like extensions for python eve) but it's still there and useful. and we could make the rest-layer-mongo work with gridfs (by either implement it directly into rest-layer-mongo or as a rest-layer-mongo-gridfs project) if we could have more freedom in accepting content-types.

while I think it makes sense to work with an application/json type for restful JSON apis and rest-layer is exactly this, I also think that rest-layer doesn't have to restrict itself to be only this. don't know if I could explain myself.

smyrman commented 7 years ago

@omani, thanks for the info.

Just for better understanding this, how would a GridFS resources integrate with the schema package? Or would you register a resource with a nill schema?

omani commented 7 years ago

@smyrman good question:

basically when it comes to gridfs files the "metadata" in "$bucketname.files" is always like this:

{
    "_id" : ObjectId("5890fec3b1c2f9774f0f6b71"),
    "chunkSize" : 261120,
    "uploadDate" : ISODate("2017-01-31T21:16:51.834Z"),
    "length" : 0,
    "md5" : "d41d8cd98f00b204e9800998ecf8427e",
    "filename" : "somefilename",
}

you could define these fields in your schema (field definitions) and would be able to provide at least a common schema to work with gridfiles. your gridfs handler you pass into your index handler would implement find, insert, update, etc. and take the above schema as input. but the only difference would be to do gridfs operations on $bucketname.chunks (the session.db(whatever).gridFS(). namespace) rather than db(whatever).C(colname) namespace).

and the result would be again the schema above

omani commented 7 years ago

but since POSTing to a gridfile generally means "writing" to that gridfile your POST method in your storer implementation would have to have a way to accept multipart/form-data requests.

rs commented 7 years ago

I never touched GridFS nor looked at GridFS implementation in Eve. I would tend to say that the REST API should manage the file metadata and give a pointer to a "data" URL handled by a different HTTP handler, but I may miss something.

I will try to look into ASAP so I can give you a more educated feedback.

omani commented 7 years ago

@rs that would be awesome. if you find me a way to make this work I could provide a gridfs handler for rest-layer already.

omani commented 7 years ago

@rs any news?

rs commented 7 years ago

Not yet, it's in my list don't worry.

rs commented 7 years ago

I started to think about file support. I want to come up with a design that could support GridFS but also filesystem and memory based storage (or any other kind of blob storage).

I don't want to mix JSON API with blobs. Supporting multipart/form-data to POST fields together with the file data does not make sense to me. I'm not a big fan of Eve's implementation in this regard (no offense @nicolaiarocci).

What I propose is to use the REST JSON API to handle metadata, and to handle its blob using a blob sub-resource, i.e /files/<id> for metadata and /files/<id>/blob for file data.

To upload a file, you first create the metadata using either POST or PUT as usual, and then you can post the file on the blob sub-resource. To upload resource, I would like to implement the resumable upload API described here: http://www.grid.net.ru/nginx/resumable_uploads.en.html. The metadata endpoint could then be polled in order to track progress.

I'm still not sure about how we could implement that at storer level. We could let a storer implement a extra interface to store/retrieve files so the resource would "unlock" this blob endpoint when implemented. With this approach, we might have to force schema of the metadata to something static that all file handlers must be able to provide, without the capability to define custom fields on that resource.

Another approach could be to create a new kind of storer that could be bound on existing resource. We could then define a different storer for metadata and and blob, but then it would be hard to link the storage file handler with the metadata storage handler.

The file storage interface could be something like:

type ByteRange struct {
    First int64
    Last int64
    Total int64
}

type Blober interface {
    WriteBlob(id interface{}, br ByteRange, r io.Reader) error
    ReadBlob(id interface{}, br ByteRange) (io.Reader, error)
}

Thoughts?

rs commented 7 years ago

We may also consider https://tus.io/protocols/resumable-upload.html as resumable upload protocol.

omani commented 7 years ago

I haven't read the resumable upload but don't forget that gridfiles can be written only once. after you close them, you have to recreate them if you continue to write to them. I don't know if this is relevant in any part.

rs commented 7 years ago

My understanding is that GridFS is just a convention. I quickly looked into it, and I don't see why we could not implement appending of even sparse file support. It would require some change to mgo or something on top of it, but nothing impossible here from a first glance.

smyrman commented 7 years ago

We could let a storer implement a extra interface to store/retrieve files so the resource would "unlock" this blob endpoint when implemented.

This does make some sense for GridFS / MongoDB. However, let's consider the simple case of "local file storage". If I choose that, I really would not want to be forced to use any particular storage engine for the metadata...

Another approach could be to create a new kind of storer that could be bound on existing resource. We could then define a different storer for metadata and and blob, but then it would be hard to link the storage file handler with the metadata storage handler.

To me, this sounds like a better trade-off.

smyrman commented 7 years ago

What about a new type of resource? resource.FileResource / resource.MediaResource?

smyrman commented 7 years ago

When initialize such a resource, maybe you could even pass it two storage backends... one for metadata and one for file/media storage that could optionally support resumable upload. Depending on how one want this to work, you might not need to pass it a schema. Maybe the schema is provided by the FileStorage / MediaStorage backend?

rs commented 7 years ago

So I could see separate storer for extra metadata, but we would still need some file specific metadata that would not be freeform and would have to be maintained by the blob handler. I am not sure it would make sense to require the blob handler to use storer to store metadata like file length, md5 etc. It might be hard to keep in sync.

What is the difference between FileStorage and MediaStorage?

smyrman commented 7 years ago

File... or Media... is just alternative names. Sorry, should had made that more clear. BlobStorer is just as good. I will use that from now on...

I am not sure it would make sense to require the blob handler to use storer to store metadata like file length, md5 etc.

Hmm... probably not good to force it. I guess if there was a new BlobStorer interface, implementations could potentialy require/allow initialization using a normal Storer for metadata when relevant?

E.g.. gor the case of a local file or memory BlobStorer, the benefit of using a normal Storer (let's say MongoDB or maybe SQL), is that the info can be searched / indexed, and that the dev could choose wich one to use.

However; I guess if certain blobstorage backends, where to support indexing or searching anyway, then probably not right to force it.

smyrman commented 7 years ago

I guess my fear when you uploaded your proposal, was that I didn't want to end up in a situation where e.g. a local file based storage engine forced me to use e.g. MongoDB for metadata, but let me choose.

But I see now that this might just be a question of how you implement the storage interface.

omani commented 7 years ago

is someone working on this?

smyrman commented 7 years ago

I am not. @rs?

rs commented 7 years ago

Neither. I have no time right now. I will look at it as soon as I can.

rs / rest-layer

Provide: multipart/form-data POSTs #87