separate out the storage controller into http://npmjs.org/package/remotestorage-server

michielbdejong commented 10 years ago

as also discussed in https://groups.google.com/d/msg/unhosted/s-jimKXlUXE/vrtxim7kTukJ - since https://github.com/remotestorage/starter-kit and https://github.com/remotestorage/remotestorage.js/tree/integration-tests also both implement the same business logic of the storage controller, but have different requirements for the actual store and web interface, i put this common part into https://github.com/remotestorage/remotestorage-server

i didn't clean up the parts of the code yet that are now unused (mainly /lib/controllers/storage), just wanted to ask you for some feedback first!

cheers, Michiel

michielbdejong commented 10 years ago

Since my fork is really quite different from the real/original reStore, and in practice is actually more closely related to remotestorage-server on which it's based, I'll just close this PR and list it separately at http://remotestorage.io/get/ as a separate remoteStorage server implementation in itself.

jcoglan commented 9 years ago

@michielbdejong, I'm so sorry I didn't get to this sooner. Shortly after that email thread where I vouched that I was maintaining reStore, I had some bad personal news and also had to change my work circumstances, leaving me little time for OSS. Since I still mostly get mail about github.com/faye, that's where most of my time has been going this year.

Still, even though you closed this PR I thought it deserved some feedback on the design, particularly relating to some work I did over the summer. I found the code here problematic when you first submitted it but couldn't articulate why, but hopefully I can do that now. I hope this is still useful.

My first question (which is not relevant now) was around the purpose of the changes. reStore already functions as a protocol encapsulation, since it supports pluggable backends. The protocol and UI is taken care of, and you can use one of the provided backends or make your own. I suppose some people would want to customise the UI, but that must be done within the parameters of the protocol and with security considerations taken into account. So, I was wondering what the motivation was for pulling out a decent chunk of reStore's core functionality -- the protocol abstraction -- and replacing it with a library?

My other questions concerns the changes to the protocol between the server and the storage engine.

I didn't like that in the permissions query API, structures like "/locog/": ["r","w"] were replaced with "/locog/:rw". I thought the previous format was already too close to the wire format, and I would have preferred something like "/locog/": {read: true, write: true}, but the new format moves the API even closer to the wire format, which has changed over time (reStore still parses 0.6-era scope strings). In particular, the caller is obliged to parse the string "/locog/:rw" into something it can make a decision with, which requires knowledge of the wire format in an inappropriate place. It also obliges the backend to implement that wire format, and possibly change it over time. The original format was designed to be wire-format-independent and thus stable.

There is further use of strings where you've replaced Date objects with strings. I can see some merit in that since the spec removed the requirement that document versions be dates, replacing last-modified with etag, but this is still a breaking change in the server/backend protocol, which is problematic.

Finally, there is the issue of replacing a single server->backend call like

store.put("zebcoe", "/locog/seats", "text/plain", "a value", "version")

with multiple calls:

store.putItem("zebcoe@local.dev", "content:/locog/seats", "a value")
store.putItem("zebcoe@local.dev", "revision:/locog/seats", "version")
store.putItem("zebcoe@local.dev", "contentType:/locog/seats", "text/plain")

The volume of stubs needed in some of the tests (remember that stubs do not specify order) should be an indicator of the problem. Breaking this single call into multiple calls has a couple of design problems.

Firstly, arguments like "content:/locog/seats" require parsing by the caller. This structure can be expressed far more cleanly by avoiding string-formatting and using a different method name e.g. getContent(user, path) or an option object e.g. putItem(user, {content: path}).

Second, breaking the call up means each GET/PUT/DELETE no longer maps to a single call to the backend. This is important because it means the backend can no longer implement these operations atomically, and operations from multiple requests can be interleaved. I recently did some work to make the Redis backend perform all these operations both atomically, and in a way that's safe if the server crashes (d668f33063122f7a181563e34b35261882014f5f, 7e19bf38afad5dae74c8a0967e0b21e6439336d8, cf0f06dd6fcde68aae2593a511ed383de2f21302, 34e1cf1298f41b6def01ecd0d26091cf8ab1add8). The API you're proposing makes that design impossible, since the things that need to be done atomically are now spread across multiple method calls with no transactional container to prevent their interleaving.

It also breaks an abstraction boundary by requiring the server to perform all the directory updates e.g. store.putItem("zebcoe", "content:/", ...) rather than having the backend implicitly perform directory updates as part of the put() call. These directory updates must be performed atomically if the tree is to remain consistent, and making the caller do the updates as multiple calls makes it impossible for the backend to ensure atomicity.

I hope this makes sense and is of some use to however you are running your server now. This is a bunch of stuff that gave me a gut feel about not merging the PR at the time that I've only recently been able to articulate properly. Sorry again for the delay.

jcoglan commented 9 years ago

I made some changes to my previous comment since posting, in case you're reading via email. I would add to the point about directory changes -- making the server make many small calls rather than a single put() also prevents the backend from performing the operation efficiently (it may be able to batch some of these calls together) and also exposes too much information about the backend. For example, a filesystem backend has to delete all documents from a directory before deleting the directory itself, whereas a Redis backend can make these changes in any order and can in fact perform all of them at once in a transaction. The caller should not have to know about these differences but this API design means that it has to.

michielbdejong commented 9 years ago

Good points, thanks for the elaborate feedback! I agree with you, it's better to keep reStore the way it is.

This PR was not an attempt to improve reStore's architecture. You are right that the current architecture is better.

Intstead, this library came out of the starter-kit code, and we thought it might make sense if reStore would use it, too.

Given all the points you mention (and with which I agree!) I think now it was the right decision to close this PR.

Furthermore, I will think about how your arguments port to the design of the library, in particular regarding whether these multiple calls can be made into one atomic call to the store.

I will keep maintaining reSite separately from reStore (maybe I'll rename it). I think it's good if there are at least two open source remoteStorage servers.

Thanks a lot!

jcoglan / restore

separate out the storage controller into http://npmjs.org/package/remotestorage-server #13