Allow to patch documents

clochix commented 10 years ago

I would like to be able to update only some properties of the document, to lower bandwidth (in an application, I have potentially huge documents and I often need to only update a state flag. PUTing the whole document is not efficient on a mobile data connection).

So what about adding the PATCH verb and using RFC 6902 (http://tools.ietf.org/html/rfc6902) that describe how to patch documents serialized in JSON ?

michielbdejong commented 10 years ago

The recommended way to do this would be to use smaller documents. Just split up your data in a way that makes it easy to edit. For instance: instead of having a document that is your todo list, have one folder called "tasks/", and then one document per task in your todo list. That also minimizes versioning conflicts.

clochix commented 10 years ago

My documents are web articles, so I can't split them. Even if the document weight only tens ko, uploading it on a slow connection every time I update a flag seems a waste of bandwidth (I'm thinking of mobile users in countries where mobile data plans are expensive).

raucao commented 10 years ago

Also not feasible when single documents would be too many, like e.g. chat message archives etc. With PATCH we could add single chat lines to an archive file.

waldheinz commented 10 years ago

I think there are very valid use cases for this, but adding it to the standard opens a door which maybe sould rather remain closed. How about stitching together images? Or replacing samples in a sound file? Or concurrent editing of text files? Or...? This is just to much to reasonably expect anyone to implement.

Maybe it's time to work out a way for clients to query supported server-side extensions, and having RFC 6902 one of them? Could a rule of thumb be that a basic remotestorage server is never required to parse documents for it's duty?

raucao commented 10 years ago

Good points.

michielbdejong commented 10 years ago

servers already MAY announce support for byte ranges although none do. I just looked it up https://github.com/remotestorage/spec/blob/master/draft-dejong-remotestorage-03.txt#L261-L263 and remembered that in the end we only added it for GET requests, not for PUT requests. https://github.com/remotestorage/spec/blob/master/draft-dejong-remotestorage-03.txt#L491-L493

If someone is interested in adding this to a server implementation, and adding support for it in remotestorage.js, then we can extend options for Content-Ranges support to GET, PUT, both, or none.

waldheinz commented 10 years ago

While byte ranges do not require parsing (and thus do not collide with my proposed "rule of tumb"), it's maybe worth having an eye on what the blob stores on GAE and AWS (and whatever else is out there) support. Ranges for GET requests are very useful for media playback and always easy enough to emulate even if the underlying data store does not support them. But for PUT requests it might be difficult if the blob stores do not support partial updates (and the use cases are not as common, too). I'm not familiar with such cloud providers, so I can't say what's possible there and what to avoid, I just wanted to raise awareness.

raucao commented 10 years ago

I think it's enough of a problem for server implementers to justify not requiring it in the spec. Which means, as stated earlier, that we'd need a way for clients to detect features. The Webfinger entry would be my first choice, as it provides information about the storage already.

michielbdejong commented 10 years ago

I know that Google Drive supports byte ranges on PUT, they call that "resumable uploads".

We are already using webfinger to detect features, here: https://github.com/remotestorage/spec/blob/master/draft-dejong-remotestorage-03.txt#L465-L466

michielbdejong commented 10 years ago

I agree with @waldheinz's rule of thumb that servers should never have to interpret the contents of the octet-streams they store. Another rule of thumb is that we shouldn't add things to the spec that nobody needs. In fact, nobody has used query parameters or byte ranges in the six months since they were introduced, and at some point if nobody ever uses them then that might be a reason to remove them again, so that the spec doesn't bloat unnecessarily.

raucao commented 10 years ago

nobody has used query parameters or byte ranges in the six months since they were introduced

How could you possibly know that? Apart from there being no way of knowing that, byte ranges are absolutely needed for media playback. There's no way anyone wants to wait until a whole media file downloaded before being able to jump to any position.

Token query params are used in Francois media player app, and byte ranges are used by all browsers and across the Web.

ghost commented 10 years ago

The token parameters are required for linking to private files, i.e. for use in the img, video or audio tag. Without those you can only play 'public' files.

clochix commented 10 years ago

Maybe could we create other issues for points not related to the initial subject ?

I just would like to be able to send a new value for one of the properties of the document. Server has just to update the value, without performing any operation on it, so it seems pretty easy to implement.

An alternative to server feature detection could be to just use existing HTTP verbs, maybe 405 (Method Not Allowed) or 501 (Not Implemented) : the client try to send a PATCH request, and if it doesn't work (may also be a proxy error for example) fall back to PUT. This only add a request on document update, and could lower bandwidth.

Also, I haven't tested them myself, but there are some JS libraries in the wild that implement RFC 6902, both client and server side.

michielbdejong commented 10 years ago

@clochix sorry for our group enthusiasm :)

For my answer to your original question, see https://github.com/remotestorage/spec/issues/65#issuecomment-44664506 - basically:

1) PUT with Content-Range header would be better than PATCH because it's more low-level, which is what we want. 2) I don't want to add things that nobody uses, but if you volunteer to add this on both sides (https://github.com/remotestorage/remotestorage.js and either https://github.com/remotestorage/remotestorage-server or some other server implementation), then that changes the situation, and we can add PUT with Content-Range header as an optional announcable feature.

raucao commented 10 years ago

PUT with Content-Range header would be better than PATCH because it's more low-level, which is what we want.

Why should it be either or? Both make sense in their own right and for different use cases.

michielbdejong commented 10 years ago

KISS

waldheinz commented 10 years ago

My understanding of the situation is that

GET ranges have obvious use cases for media playback and are easy to implement, so adding them to the standard makes sense (whether as SHOULD or MUST is to be discussed)
PUT ranges have fewer use cases and implementing them can be inefficient (although not necessarily difficult), so I'm a bit uneasy adding them to the standard (mentioned them with MAY is IMHO of the same effect as not mentioning them at all, right?)
PATCH support has it's uses as well, but it's implementation is very unrelated to byte ranges, and difficult as well

So I'd say the latter two should be detectable by the client (e.g. using webfinger), but if the server does not support that features the client is on it's own.

michielbdejong commented 10 years ago

To summarize again:

Content ranges for GET are a MAY in the spec.
We can add content ranges for PUT if somebody uses them.
That means only if somebody uses them.
We don't need PATCH if we have content ranges.

Spec bloat is not a goal, it is the enemy. If you don't agree then just look at how webdav died. :)

waldheinz commented 10 years ago

We don't need PATCH if we have content ranges.

It's not obvious to me how to implement PATCH atop of PUT + ranges, but I agree it should stay out of the spec. Having a central repository where "common" extensions are listed / specified (probably just as part of the Wiki, because of KISS) would be a good thing, though. And PATCH could be listed there.

just look at how webdav died

I just wanted to propose why we absolutely need XML in HTTP request / response headers, but if you don't like that... ok :-)

raucao commented 10 years ago

We don't need PATCH if we have content ranges

As I said, that doesn't make a lot of sense. These are 2 different things, and PATCH makes more sense for updating JSON objects, while content ranges make more sense for uploading resuming and updating generic/binary data.

(In case you haven't noticed, I'm not arguing for putting it in the spec asap, I'm just saying it makes sense for some use cases that I have already, and that it's different than content ranges.)

clochix commented 10 years ago

I quickly hacked a PoC to see if it could work. See https://github.com/clochix/remotestorage.js/commits/patch and https://github.com/clochix/remotestorage-server . This is quick and dirty work, I admit being a bit lost in the layers of the client library, and encountering some weird behaviors. But I managed to see it work one time and patch a remote object by only sending the diff :) So, if you want to play with it…

michielbdejong commented 10 years ago

As I already said in previous comments (maybe not clearly enough), I do not want to add PATCH to the spec, as it would destroy the "simple GET/PUT/DELETE interface" which we spent all these years building, and change it into something else.

@clochix Thanks for building the PoC, but your feature request is really too high-level for our spec. You may be more interested in taking it to the data.fm project which is much more focused on the more semantic level than our spec.

raucao commented 10 years ago

@clochix I really like the idea; thanks for the PoC! I'll play around with it with our implementation as well, in order to better learn how useful this would be in production and for what use cases.

almet commented 10 years ago

@michielbdejong hey! I don't really understand what's the problem with PATCH here and how that would make the interface more complicated? e.g. wouldn't it stay the same for people who don't want to use it, and only adds an option for people who want to reduce the bandwidth they use (I think that's specifically a good idea because of mobile)?

waldheinz commented 10 years ago

@ametaireau The problem is mainly that people writing servers have additional trouble. Also, there's nothing stopping anyone from implementing it and announce support in the webfinger data set. It's just not required that everyone does so.

remotestorage / spec

Allow to patch documents #65