remotestorage / remotestorage.js

⬡ JavaScript client library for integrating remoteStorage in apps
https://remotestoragejs.readthedocs.io
MIT License
2.32k stars 141 forks source link

instruct app devs to add in-memory caching #408

Closed michielbdejong closed 10 years ago

michielbdejong commented 11 years ago

as we saw in https://github.com/remotestorage/modules/issues/23 indexedDB is quite slow, so you need an in-memory cache in front of it.

rather than adding such a cache at the lib level or at the module level, let's instruct app devs to do it at the app level. many app devs will already have a data representation layer, tied to the DOM, or to their favorite framework. in my experience, adding in-memory caching to a backend layer will not stop the frontend developer from adding another one in front of it, usually because they need to store presentation-related aspects in it, or simply because they didn't know that an in-memory cache was already present.

representing your data in memory is trivial: just use a map (var items = {};) or an array (var items = [];). then implement the onChange function coming from the module, and do write-through changes when changing something (you may want to ignore change events with origin=='window' for this)

silverbucket commented 11 years ago

I think it makes more sense to keep the quirks/limitations of the remotestorage internals away from the app developer whenever possible. It also makes re-factoring apps easier, as we won't have to re-write (or instruct app developers to re-write) their apps to be efficient.

Why isn't this something remotestorage, or at least the modules (but preferably the lib), can do?

michielbdejong commented 11 years ago

we could build an extra caching layer if we wanted to, but your framework probably already has one, so then you would end up with double memory usage. in the specific case of for instance Angular, the thing to find out would be if your controller actions result in a DOM update before they get written to remoteStorage, or after.

silverbucket commented 11 years ago

it doesn't write to the dom unless you are doing visual updates, it doesn't store vars in the dom afaik

On Thu, Aug 15, 2013 at 5:14 PM, Michiel@unhosted notifications@github.comwrote:

we could build an extra caching layer if we wanted to, but your framework probably already has one, so then you would end up with double memory usage. in the specific case of for instance Angular, the thing to find out would be if your controller actions result in a DOM update before they get written to remoteStorage, or after.

— Reply to this email directly or view it on GitHubhttps://github.com/remotestorage/remotestorage.js/issues/408#issuecomment-22709445 .

shybyte commented 11 years ago

If indexedDB is slow, but localStorage is 40 times faster (according to https://github.com/remotestorage/modules/issues/23) , then remoteStorage.js should implement a localStorage cache in front of the indexedDB cache. This way for the app developer and the user it would look like everything is fast, but the data is still already saved at least locally.

michielbdejong commented 11 years ago

hm, that would be a feature i guess, although that would only help to continue with outgoing writes across a page refresh. if someone imports 1000 contacts and then refreshes the page before it has all been written to disk, then i think that's quite a rare edge case

shybyte commented 11 years ago

Hmm, I don't know where the speed differences between indexDB and localStorage from measured from michiel in https://github.com/remotestorage/modules/issues/23 comes from. What I found out is that for my use case (import of many contacts) remoteStorage 8.1 (indexDB) is a lot faster compared with remoteStorage 0.7.3 (localStorage).

But it's still slow. One reason for this slowness is, that every saveRequests gets an own indexDB transaction. I played a bit around with indexDB and I got a speedup between factor 15-40 (depending on browser) when putting all save actions in one transaction compared with several transactions. This means that for actions like mass import of data, remoteStorage.js should provide a special function which puts everything in one transaction. I guess this would allow even more optimizations beside of a common indexDB transaction and could be useful to prevent loss of data or inconsistent data. Just imagine for example moving 200 emails from one folder to another. I would like to have this action to be fast and transactional. Requesting the app developer to solve the speed problems in such cases by implementing their own memory cache is calling for potential problems, inconsistent data and data loss.

shybyte commented 11 years ago

Another reason why a memory cache implemented by the app developer does not solve the speed problem: Imagine an app that uses several windows. This is an situation where an memory cache in combination with long save times can easily gets complicated to implement and cause problems, because the app developer also has to care for synchronization between the windows, which remoteStorage.js could handle otherwise.

silverbucket commented 11 years ago

+1

raucao commented 11 years ago

This means that for actions like mass import of data, remoteStorage.js should provide a special function which puts everything in one transaction.

Great idea!

michielbdejong commented 11 years ago

ok, let's add a function storeObjects() to the base client

raucao commented 10 years ago

Did anyone work on this since then?

michielbdejong commented 10 years ago

i didn't include it in 0.8.2 because i wasn't sure whether we need it. this would be superseded if we enable in-memory storage always - then the promise could fulfill already as soon as the in-memory cache is updated, and write-through to IndexedDB could then be queued. so let's leave it open until we have a resolution on that

raucao commented 10 years ago

This has nothing to do with the in-memory cache at all. The memory backend is meant to be only a fallback, when all other persistence methods aren't available. It won't be used at all by default, let alone as an additional layer between other layers.

storeObjects() is meant to be a way of storing multiple objects at once, regardless of the storage backend, in order to make some of them much faster than they are now.

ggrin commented 10 years ago

how should we call this method, array of pathes of objects and mime types?? or building an array of objects each with path, mimeType and Object??

raucao commented 10 years ago

What happened to the method that created objects for validation, but didn't store them? I'd use these kinds of objects for passing to the new method.

raucao commented 10 years ago

@ggrin Any news on this? Or an answer to my question?

ggrin commented 10 years ago

@skdc

in src/baseclient/types.js L68 ther is a method called validate, it is atached to Baseclient.prototype, from inside a module you can use it via publicClient.vaidate(item); where item must posess an @contect propertie.

ataching the @context property happens in Baseclient.prototype._atachType(object, alias); this method changes the passed object;

this should be in the docs.

raucao commented 10 years ago

I believe I meant saveObject or something. But that's irrelevant then, as it seems to not exist anymore. I was trying to help with your question regarding what to pass to the method. I think an array of objects would be the way to go, and whatever saveObject created would probably be good.

It's actually a good point. What about validation when storing multiple objects with a single call? Do we validate all of them and not store anything if one is invalid, or do we store some and return a list of validation errors for the rest?

michielbdejong commented 10 years ago

sorry, just saw i forgot to answer https://github.com/remotestorage/remotestorage.js/issues/408#issuecomment-28382320 a month ago. my proposal with 'in memory' was to create a queue. so the code can happily calls storeObject with its current API, and these calls then get queued, superfast because at first it only stores in memory, not yet in IndexedDB. Then you trigger a function that writes through the current request, plus all other queued requests, in a batched way to make it efficient. so you get a timeseries like this:

code:     queue:   indexeddb
store(1)  [1]       start store([1])
store(2)  [2]          | (storing [1])
store(3)  [2,3]        | (storing [1])
store(4)  [2,3,4]   finish store([1]);
-         []        start store([2,3,4])
store(5)  [5]          | (storing ([2,3,4])
store(6)  [5,6]        | (storing ([2,3,4])
etc...

this keeps the api simple and superfast by default (you even benefit from it when storing only one object)

michielbdejong commented 10 years ago

validation would then happen before enqueueing, of course

raucao commented 10 years ago

That would mean that you fulfill the promise, when data is not yet written to the cache, right? Isn't that a bit of a fake success status then?

Also, how do you handle syncs in between your queuing and further editing of the data? It seems like this could get complicated quickly.

michielbdejong commented 10 years ago

Isn't that a bit of a fake success status then

yes; should anything go wrong during write-through (which should only happen if there's a bug in our code, in the browser, or the device has a disk-full or out-of-memory error or whatever) then you would just throw a general error event, either to the app or to the widget. this is what happens now anyway with pushing changes to remote: the storeObject function returns success immediately; after that, if write-through fails, then the widget just displays offline state.

how do you handle syncs

app -> module -> queue -> indexedDB -> sync cycle (pushChanges)-> wireclient -> remote

for outgoing changes; just add the -> queue -> in there.

For incoming changes it stays the same, namely:

app <- module <- change event <- sync cycle (synchronize) <- wireclient <- remote
michielbdejong commented 10 years ago

i'm now thinking probably the module should be the one caching "hot" data and then also composing the batch requests could happen at the module level.

i'll work on a proposal for this for the contacts and money modules, which are two modules that really need this.

silverbucket commented 10 years ago

It's probably something most modules will need, should be part of a "module utils" module?

On Thu, Dec 19, 2013 at 7:11 AM, Michiel@unhosted notifications@github.comwrote:

i'm now thinking probably the module should be the one caching "hot" data and then also composing the batch requests could happen at the module level.

i'll work on a proposal for this for the contacts and money modules, which are two modules that really need this.

— Reply to this email directly or view it on GitHubhttps://github.com/remotestorage/remotestorage.js/issues/408#issuecomment-30907458 .

raucao commented 10 years ago

This is needed for importing data from other sources, so it needs to be in core.

michielbdejong commented 10 years ago

yeah, see http://community.remotestorage.io/t/module-utils-prefixtree-syncedvar-and-syncedmap/133

for now i'm using one storeObject call per outgoing change, but that's where i would batch them if we had a storeObjects function on the baseClient.

raucao commented 10 years ago

Ah, right.

michielbdejong commented 10 years ago

closing in favour of #556