urql-graphql / urql-exchange-graphcache

A normalized and configurable cache exchange for urql
88 stars 13 forks source link

Offline and persistent cache support #61

Open JoviDeCroock opened 5 years ago

JoviDeCroock commented 5 years ago

Offline

We all think about this in the modern PWA-era but there's a lot to this. We'll have to keep track of what requests the user needs to send when the connection is restored, after these requests are sent there will MOST likely be several optimistic entries to clear.

Operations

So for knowing what operations to cache it should be sufficient to only cache mutation operations. These will then be kept in a map<key, operation> and be persisted to some indexedDB/localStorage when we kill the application and they haven't been dispatched yet.

The hard part about this is that we would have to restore the optimisticKeys in the exchange, this makes me think about moving these to our instance of store instead. Since the serialisation of entities, links and optimisticKeys could then happen from one place. This brings as additional advantage that it can be done with one restore method.

One concern would be the read/write speed of killing/rebooting the cache in this state. The HAMT structure is quite hard to serialise taking in account that it will contain optimistic values mixed with normal ones.

Connection checking

This should be easily doable by means of navigator.online, we could buffer all requests until we come online and then send them in the correct order one by one to avoid concurrency problems. The difficult part here wold be that we buffer up until all operations are dispatched, this means that if the user performs another action while we are emptying the queue this could take a while to get a response (given we are using optimisticResponses though).

Ideally when we see we are offline we filter all queries, and just keep them incomplete. When we see we are going offline all subscriptions should receive an active teardown.

Exchange

When reasoning about this my thoughts always wonder to a separate exchange to manage the operation buffering and to incorporate the restoring/serialising inside the graphCache. This has a bit of an overlap but I think it's sufficient reason to keep them separate.

Persistance

Here I'm having issues seeing how we could effectively solve this, we have the schema now so we could potentially just iterate over the whole schema and write it that way but that won't cover the case where people just want persisted-cache without the whole schema effort.

What scares me the most about this is that localStorage isn't the ideal candidate for persisted cache but by using indexedDB we exclude about 5% of the browser population. IndexedDB seems to ask for permission if a blob is >50MB on Firefox, that's about it no explicit size limitations for even a single data field.

The max size for localStorage is 10MB so I don't really think this is sufficient for big applications, since the initial cost of the data structure is also there. We could strip everything down but how do we rebuilt it then, maybe by bucket size?

This is a brain dump of what I've been thinking about and is by no means a final solution but I think this could serve as an entry to finding the solution to what feels like a really awesome feature.

Other relevant solution: https://github.com/redux-offline/redux-offline/tree/v1.1.0#persistence-is-key

This uses redux-persist under the hood that also relies on indexedDB under the hood. Since this is a reliable and widespread solution I think it's safe to resort to indexedDB and fallback to localStorage when needed.

For react-native we can easily resort to the AsyncStorage module. It seems that AsyncStorage isn't 100% safe either since on android this errors out when you exceed a 6MB write.

Introducing some way of leaving certain fields/queries out seems very mandatory to me since in the test described underneath we see that we're hitting the limits of localStorage pretty quickly.

Test

I did a small test with our current benchmarking where I serialised 50k entities and just wrote them to a JSON file to look at the size:

ENTITIES 14260659B 14.260659MB
Links 664618B 0.664618MB

This already exceeds the limits of localStorage and would cause a prompt in indexedDB asking for permissions saving this amount of data.

Code used:

const urqlStore = new Store();
write(urqlStore, { query: BooksQuery }, { books: tenThousandBooks });
write(
  urqlStore,
  { query: EmployeesQuery },
  { employees: tenThousandEmployees }
);
write(urqlStore, { query: StoresQuery }, { stores: tenThousandStores });
write(urqlStore, { query: WritersQuery }, { writers: tenThousandWriters });
write(urqlStore, { query: TodosQuery }, { todos: tenThousandEntries });

const entities = JSON.stringify(urqlStore.records);
const links = JSON.stringify(urqlStore.links);

fs.writeFileSync('./entities.json', entities);
fs.writeFileSync('./links.json', links);

const { size: entityFileSize } = fs.statSync('./entities.json');
const { size: linkFileSize } = fs.statSync('./links.json');
console.log('ENTITIES', entityFileSize, entityFileSize / 1000000.0)
console.log('Links', linkFileSize, linkFileSize / 1000000.0)

Wild thoughts

I've been thinking about maybe making a distinction between a storage.native and a storage file. This way we could leverage web workers and application cache to write our results at runtime instead of just when we close the application.

Requirements

To implement persistent data we would have to implement an adapter with an API surface for getting setting and deleting. People can in turn pass in every storage they would like, this way people who use something like PouchDB can write an adapter and just use that.

We should decide on an approach when to write, after every query? This would make us have to write after every optimistic write as well which makes everything a tad harder certainly since it's going to be hard to incrementally write changes from our HAMT structure. I think it's better to work with a hydrate and exit approach. This could make writes take up more time but in the end would require a whole lot less logic.

We would need an approach that can evict certain portions of the state from being cached, examples would be an exclude/include pattern. When we include something that will be the only thing being cached. When we exclude something all but that exclude will be cached. These should be mutually exclusive.

When not supplied with a schema how would we arrange for excluding data.

Drew up a diagram of how I expect this to happen, the code for the offline part was easy to write and is done.

Screenshot 2019-09-05 at 15 05 40
zsolt-dev commented 5 years ago

Thank you for working on this.

I think it would be good to allow everyone to use whatever persistent storage they want. For example, the https://github.com/apollographql/apollo-cache-persist allows you to select these:

or any custom storage, for example I use this to connect to the indexedDB:

import { get, set, keys, del, clear } from './idb-keyval';

export default {
  clear() {
    return clear();
  },
  getItem(key) {
    return get(key);
  },
  setItem(key, value) {
    return set(key, value);
  },
  keys() {
    return keys();
  },
  remove(key) {
    return del(key);
  },
  removeItem(key) {
    return del(key);
  },
};
wtrocki commented 4 years ago

Hi

I'm the maintainer of the Apollo-Cache-Persist and various offline libraries for GraphQL. I have been playing with the Urql-exchange-graphcache for a while and I absolutely love it. I think on very simple layer persistence can be done today by utilizing a similar mechanism as cache-persist/redux persist. However, this will mean storing the entire cache as single key. Typical cache snapshot approach that is very inefficient and jitters rendering. The alternative will be to deliver a persistence mechanism that will spread cache by keys and types - something that redux persist was doing. Do you allow the community to deliver something simple for the moment and then drive better support later?

I think from the community side Apollo-Cache-Persist is the main reason why so many people use Apollo Client at the moment in their React-Native apps that tend to kill views when transitioning.

wtrocki commented 4 years ago

I will also put some extra info for context after 2 years working with GraphQL cache:

I think for the quick win we could adjust apollo-cache-persist implementation to work with the urql or create a separate package that will hook into cache write operations and will know how to restore it. I haven't really tried to do it to say how hard it will be. What will be needed is just to provide a wrapper to cache write method like here:

https://github.com/apollographql/apollo-cache-persist/blob/master/src/onCacheWrite.ts

JoviDeCroock commented 4 years ago

Hey @wtrocki

I'm super happy that people are interested in this issue, we encourage community exchanges and are happy to help out where possible.

I can look into that wrapper this weekend

wtrocki commented 4 years ago

Our exchanges allow.

The way it will work is that there will be separate persistor available globally that will need to be awaited and then it will setup initial cache. If graphcache has ability to seed initial data then this is very trivial to implement

Optimistic responses will be a hard thing to tackle I

Absolutely not. They should be ignored completely. Most of the frameworks like offix or luna.js recreates them anyway. Trick is to have cache that do not apply optimistic responses to data.

See: https://github.com/wtrocki/apollo-client/blob/07a4f2c4b7cfe4c31ed41a393e5e0da317780661/packages/apollo-cache-inmemory/src/inMemoryCache.ts

It has two separate fields:

Restart is kinda tricky as there is no way to restore promise chain that usually removes optimnistic responses so best to not store them. Do you have such counterpart in graphbache?

This should be possible but not entirely sure on the implementation

This should be possible by hooking into client.destroy() method. Since cache is an separate exchange to client it is best to hook into client lifecycle. (but not sure about that

That's going to be a harder one since we mostly save what is received from the backend and don't add extra's to it.

IMHO this will be trivial.

Let's collaborate on this If there is sample app for url that has cache and if we can simply add console.log everytime backend payload gets saved then integration should be trivial and we can donate tons of code from cache-persist that will work here.

My main question will be to see if save should be persisting entire cache every time or it should be connected to individual server responses (which comes with tricky normalization challenge)

wtrocki commented 4 years ago

EDIT: I meant https://github.com/wtrocki/apollo-client/blob/07a4f2c4b7cfe4c31ed41a393e5e0da317780661/packages/apollo-cache-inmemory/src/inMemoryCache.ts#L83-L84

JoviDeCroock commented 4 years ago

Trick is to have cache that do not apply optimistic responses to data.

Optimistic responses are layered on top of data so that in essence is no issue, my reasoning behind keeping optimism around is that we want to restore the data AND be able to dispatch the request when the user gets online. Maybe I was putting too much eggs in one basket though.

Let's collaborate on this

Definitley, https://github.com/JoviDeCroock/threed-web is an app we can use to test it, linearly we have an API for that https://github.com/kitten/threed-example-api

We can use that to test on, this has all things of graphCache implemented (optimism, ...)

wtrocki commented 4 years ago

and be able to dispatch the request when the user gets online

See https://offix.dev . This is the exact use case of this library. However it is way too much responsibility for an cache persistence.

Definitley, https://github.com/JoviDeCroock/threed-web is an app we can use to test it, linearly we have an API for that https://github.com/kitten/threed-example-api

Perfect. Going to check this and provide update in this PR.

wtrocki commented 4 years ago

So coming back with the plan. I think that having some extra interface passed to the store can hook persist method for methods like:

I have simply hardcoded store for testing purposes at the moment.

But then I'm struggling to see what fields should be saved to cache:

Screenshot 2019-11-08 at 23 24 17

There are a couple of things here that cache utilizes but also not sure about some of them. Looks like records are not enough. Links should be saved as well, so we can hook into each save and save them as individual keys. This can start with JSON.stringify data into a single key and can be extended later. IndexedDB or other storages can store native js objects so there will be little performance overhead on this. This is a very naive approach but it really ticks the box for basic persistence. This is what I deducted:

Screenshot 2019-11-08 at 23 47 42

Now we can simply restore this stuff on the restart, but I do not see a simple option to do so. Kudos for amazing sample apps that helped a lot to write some prototype.

kitten commented 4 years ago

@wtrocki That looks awesome! I’m thinking of how we could approach this at “scale” 😂

So pessimism ensures stable perf and immutability (which we don’t need right now) but is otherwise really simple. I’ve been thinking that it’d be nice if we could have a store wrapper (or modify pessimism) that provides a synchronous KV layer (like what pessimism does right now) but flushes writes to any async storage. On start we’d then only have to restore from that async storage and queue up operations while we wait for it 🤔

I think, like you said, we wouldn’t even have to preserved optimistic writes, since on a restart we’d just reexecute offline operations, which then restore the optimistic writes anyway.

Regarding what needs to be saved, it’s only records, connections, and links that are relevant to persist data.

Edit: so my thinking is; we could allow for a persistence layer that allows to pass in any store that adheres to an interface with:

Does that sound about right?

wtrocki commented 4 years ago

I’ve been thinking that it’d be nice if we could have a store wrapper (or modify pessimism) that provides a synchronous KV layer (like what pessimism does right now) but flushes writes to any async storage.

Yes. I had that exactly in mind and it should be trivial.

async/sync getLinks and others so we can restore

Would it work like singleton - first call will try to restore from persistence. It would be cool to be able to seed those 3 fields somehow at the time of cache creation.

queueOfflineOperations and flushOfflineOperations so that we can run them when we go back online

Really nice idea. Need to think on how this would work - optimistic responses and update methods will be global right?

kitten commented 4 years ago

I think we’re in a much better position now to tackle this 🥳

The pessimism KV layer is gone and has been replaced with a much simpler backing store. It’s still storing and treating optimistic entries separately, which is perfect since we don’t want to persist them.

The next step would hence be to allow a persisted store to be slotted in that we can flush writes to regularly. Then we’d want to introduce operation buffering to delay operations on startup while the store is being seeded. And lastly we’ll want to persist optimistic operations (and flush them after seeding and when the user goes back online)

Lastly we may want to enable full cache invalidation, which may need to be automatic. We could look at schema information that is persisted and invalidate parts of the offline store if it doesn’t match the schema anymore (and allow full clearing on logout for instance)

One unanswered question is how we can achieve this without I creasing the footprint of Graphcache massively.

JoviDeCroock commented 4 years ago

I’d say that we’d only need a certain amount of things in graphCache:

This way offix handles all the complex offline-online logic while graphcache remains focussed on being a normalised cache. I do agree that we should have some low_prio work that would involve taking our schema and removing fields,... this can be considered nice to have in the start though.

I think I still have a working implementation of the buffered operations, this does imply that we expect an async function to be passed to retrieve the offline store data -> run our adapter/transformer -> inject into our store.

I think if we limit ourselves to a serializer - transformer - “hydrator” the footprint impact would be small since the transformer/serializer part can be tree-shaken out and the added logic will be minimal.

First things imo would be to see how solutions for offline storage persist at this time and see how we can deserialize on our end to inject it.

wtrocki commented 4 years ago

I’d say that we’d only need a certain amount of things in graphCache: a way to inject the offline store

I think that is the key to everything Once that it is possible I can work on connecting offix or even just storage like localForage etc. I will try to apply the changes as suggested above and see if that will work.

First things imo would be to see how solutions for offline storage persist at this time and see how we can deserialize on our end to inject it.

There is actually nice thread on apollo client repo (as it will get cache storage feature for 3.0. Do not want to link it here but TL;DR - generally because of the storage limitations on web cache is persisted once for a while using entire cache object. The same implementation exists in apollo cache persist.

JoviDeCroock commented 4 years ago

@wtrocki I've started making an initial implementation for rehydrating a store: https://github.com/FormidableLabs/urql-exchange-graphcache/pull/124 now we need to hook into an adapter to write/delete/... On for instance offix.

wtrocki commented 4 years ago

Awesome! thank you so much and sorry for not making it on time. Yes. I will try this out with the customized apollo-cache-persist and it that work I will post it as a package. Having dev version of the PR published will be amazing. Follow up will be to get more complex use cases like cache invalidation etc. (offix)

kitten commented 4 years ago

Persistence has been implemented now by #137 and #138. There's an example that demoes it in #141.

The next step now is working on an offline exchange (or a built in one into the main cacheExchange) that integrates with this and supports queueing up offline mutations, keeps the optimistic update intact (if any), and is able reexecute offline mutations on startup or when the user comes back online.

@wtrocki We publish every PR via Pika CI. So you can already give this a go by installing "urql": "https://github.pika.dev/formidablelabs/urql-exchange-graphcache/pr/138"

JoviDeCroock commented 4 years ago

I think to efficiently do this in a separate exchange we'll need to add, OperationRequest.hasOptimisticResult, else we'll never know whether or not to let the mutation gracefully fail or buffer it either way.

morrys commented 4 years ago

Hi to all, I wanted to get to know the libraries I created to manage persistence and offline workflow for GraphQL libraries:

wora/cache-persist: uses a javascript object synchronously and processes communication with storage asynchronously (highly configurable in all its aspects, storages: localStorage, sessionStorage, indexedDB, React-Native AsyncStorage & any custom storage)

wora/netinfo: simple library that implements the react-native netinfo interface to use it also in the web

wora/offline-first: persistent Cache store for Offline-First applications, with first-class support for optimistic UI. Use with React, React Native, or any web app.

i used it for create:

The main advantages of integrating these libraries are:

In the repository offline-examples you can find examples of using the offline for apollo (web and react-native) and relay (web and react-native)

For any additional information or if interested in making a beta in which they are integrated, please contact me I will be happy to answer and help.