RFC(graphcache): Provide a view onto the Graphcache cache

Summary

Currently the state of the Graphcache cache is pretty opaque. It's possible to introspect it via the store prop in some places, for example by using the final info parameter provided to updaters, however this is scoped to a particular query or mutation, and the state of the store isn't exposed at a more general level, for example it isn't available on operations objects, and the Graphcache doesn't provide an onChange hook or anything similar.

Making it easy to view the cache would make debugging easier and might open the door to improved tooling around Graphcache. To be clear I'm not suggesting providing direct access to the store, and some kind of utility function that dumps the current state of the store would be fine.

I also think it would be really helpful in understanding some of the 'magic' going on in the cache and help with learning how to use it optimally.

Proposed Solution

I'd like to see the state of the store made available on operations, or at least via a callback supplied via the Graphcache config.

You can already dump the state of the store. It's not documented and not guaranteed to be stable, but the cache argument that is passed to resolvers, updaters, etc is the Store class. This is basically Graphcache's state interface which holds on to the in-memory cache on the data property.

In older issues, you can see some debugging and questions around it when the state was dumped for reproduction purposes by logging console.log(cache.data). A full "snapshot" of the state can hence be dumped using console.log(structuredClone(cache.data))

I'm not saying it's a complete solution, but your RFC is of course a little shallow in detail, so this does technically fulfil all criteria 😅

To be specific, even in the past, it isn't immediately obvious to people how to use this dump without having a working knowledge of the InMemoryData implementation. Since it doesn't always remain in a state of having one flat lookup table, but rather multiple, this makes debugging without knowing when and why it would have multiple layers impossible.

The most common problem that I've seen people struggle with however are simply cache misses.

These can occur in three scenarios:

A resolver returns an undefined value (or an invalid value in general)
- This is easy to debug, because the values are computed by the resolver in-place. It's uncommon for people to struggle with this for too long, because reading from the cache inherently gives you your result and it's possible to track down unexpected values
A cache updater can write an incorrect value
- Again, this is easier to debug, as per the above. However, people sometimes then run into queries not behaving as they'd expect, e.g. due to missing data or links.
An optimistic update or a combination of these with updaters doesn't lead to a valid cache hit
- This is hard to debug, because an optimistic update necessitates that no network request is made. So the failure scenario leads to no change in behaviour at all.

This is a known limitation. And I'd rather have warnings and console messages they inform people where cache misses happen when a subsequent network request is "blocked"

But basically, there is no method to observe the state, because the state alone doesn't guarantee that it sends people back on a happy path.

I have thought about the above debugging console messages recently, and it isn't quite that trivial to track where cache misses happen, but I still think it's more worthwhile than adding state dumps or other utilities that wouldn't actually help you, unless you already have a deep understanding of how Graphcache works 🤔

So, the alternative proposal I had in mind was:

each cache query keeps track of dependencies already
Add a mapping in development of where cache misses have taken place
Provide a function that, in verbose mode, can output why a cache miss has occurred (i.e. at which field and type)
- If possible, backtrace to the operation Graphcache was doing (i.e. whether it was using a resolver)
- If possible, associate the above with recent updates and track those (this is the hard part, but crucial to providing a good debugging experience)
Output the above by default, even when the verbose mode isn't activated in development, when a subsequent network request for a query has been blocked

Thanks for your thoughtful reply. Maybe I'm trying to scratch the Redux itch, but I'm definitely finding it tough not being able to visualise the state of the cache at any given time.

To be specific, even in the past, it isn't immediately obvious to people how to use this dump without having a working knowledge of the InMemoryData implementation. Since it doesn't always remain in a state of having one flat lookup table, but rather multiple, this makes debugging without knowing when and why it would have multiple layers impossible.

It's obviously hard for me to discuss this without knowing the inner-workings, but isn't it fair to say that even with the multiple layers, those layers can be reduced to a single state that represents what the client sees/receives? If so, that's the kind of snapshot I'd find helpful. A 'This is what the data looks like right now to the client' snapshot. Definitely appreciate I'm probably oversimplifying here …

It's not documented and not guaranteed to be stable, but the cache argument that is passed to resolvers, updaters, etc is the Store class.

If something like my above suggestion was possible, it would be great to provide it in a single location/callback for the store as a whole. Otherwise you'd need to add e.g. an updater for every query and mutation just to access and log/dump the cache whenever it changes. This is why I suggested a callback that is passed into the Graphcache config.

{
  onChange: (store) => {
    // Manually introspect/dump/log store
  },
  updates: {
    …
  },
  optimistic: {
    …
  } 
}

The most common problem that I've seen people struggle with however are simply cache misses.

I'd definitely welcome better logging for this as it is very opaque at the moment. However I think my proposal is more general than that.

Thanks for your thoughtful reply. Maybe I'm trying to scratch the Redux itch, but I'm definitely finding it tough not being able to visualise the state of the cache at any given time.

That's why the recommendation is to see and keep Graphcache as a close representation of the server state (i.e. the server data) given that, there isn't much to keep track of.

That's also why the devtools — which admittedly may need an update at some point — don't show the cache state. That's of course also because urql is built to be cache-agnostic, but also because a single merged representation of all GraphQL responses in one are a pretty good representation of a normalised and non-normalised cache's state.

In other words, if it's assumed the cache shouldn't diverge from the server state, there isn't much information you'll be able to gather from the in-memory cache state, that shouldn't already be surfaced otherwise.

It's obviously hard for me to discuss this without knowing the inner-workings, but isn't it fair to say that even with the multiple layers, those layers can be reduced to a single state that represents what the client sees/receives?

Given that, if we reduced all the state back into one layer, that isn't a guarantee that the state matches what you're seeing. If we just create a new state representation, just for debugging, there isn't any guarantee that this represents what the state actually sees and does.

So overall, the warnings and caching logic in general, are designed to provide you APIs that help you match and build up a representation of "server data" and not diverge from that.

So, I'd see any attempt of debugging the in-memory cache already as a failure of that, more or less, hence my cases of this listed out above.

Again, if you do want to see a representation of the server state, add a structuredClone(store.data) to a resolver and you'll see any prior change applied in there 😅

If we implement an API for this, or more APIs for debugging, in my opinion, all they achieves is:

more surface area for hacks and monkey patches that people then ask questions about that are hard to debug themselves
more surface area for debugging that is less likely to lead to success than other methods of debugging (current or as proposed above)

Like, I get the motivation and the thought process of you proposing it, but thinking further here, I can't think of a case where it'd lead to a better outcome.

At the simplest example, say someone wants to see why a selection set of an entity returns in a specific shape. They're not going to see much more than what it already looks like from the in-memory cache data.

Thanks for expanding. I think I understand your position better now.

a single merged representation of all GraphQL responses in one are a pretty good representation of a normalised and non-normalised cache's state.

One of the reasons I've ended up writing this RFC is because that doesn't seem to be what I see in dev tools, for example if I query (pseudo-code):

enrolment {
  messages: {
    nodes {
      id
      body
    }
  }
}

Then query:

enrolment {
  user {
    id 
    username
  }
}

I only see the last set of data from the enrolment query in dev tools:

enrolment {
   user: {
     id: '213',
     username: 'Example'
   }
}

I'd expected to see the amalgamated result of the queries, but each subsequent query replaces the last, despite having different selection sets.

Anyway, that is a different issue. I'm going to go ahead and close this now. Thanks for taking the time to explain.

urql-graphql / urql

RFC(graphcache): Provide a view onto the Graphcache cache #3290

Summary

Proposed Solution