mattkrick / cashay

:moneybag: Relay for the rest of us :moneybag:
MIT License
453 stars 28 forks source link

RFC: return cached data for a known type/id #155

Open dustinfarris opened 7 years ago

dustinfarris commented 7 years ago

UPDATED 1/3/17

The prefetch hook described below is not in-line with the desired style of Cashay. See discussion and tl;dr in comments.

DISCLAIMER

I am still learning about cashay implementation details and it's possible one or more of my assumptions are wrong. Please push back.

Summary

Add a hook to allow users to access the cache while a query is in-flight.

Reason

When the user requests data, it is possible that some or all of the data already exists in cashay's cache. It would be beneficial from a performance standpoint to utilize that cache.

Note: the goal is not to reduce network traffic, but to display the data we already have, if we have it.

Cashay does this already under certain circumstances:

There is an additional scenario, however, not covered by these—when the user is requesting a GraphQL field subset for which partial data already exists in the cache.

Consider this query from a project detail page:

query {
  project(id: $project_id) {
    id
    name
    todos {
      id
      name
    }
  }
}

The project detail gets all todos under it, selecting their individual ids and names. Now suppose the user clicks a specific todo, and is taken to a todo detail page which might have the following query:

query {
  todo(id: $todo_id) {
    id
    name
    description
    assignee {
      id
      name
    }
  }
}

Cashay sees this as a totally new query, yet some of the data is already in the cache from the project page! Namely, the todo's id and name.

The @cached decorator cannot be used here for two reasons:

1) @cached assumes the entire shape can be found in cache and kicks back null for the pieces it doesn't find 2) If the user enters the todo detail route directly, the cache does not exist yet at all. @cached provides no fallback behavior if an id is not found.

Implementation

Add a new hook for cashay.query that gives the user access to the cache. In my mind, this will behave similar to the way mutationHandlers and "optimisticVariables" work currently, with the following flow:

1) user executes a query 2) cashay prepares an initial result as it currently does 3) cashay executes a "preFetch" hook, providing the initial result and access to the cache 4) user modifies the initial result using the cache 5) cashay continues to execute a server query in the usual way 6) (bonus) cashay executes a "postFetch" hook and returns the final result

Pseudo (using the todo query above as todoQuery):

const todoQuery = `
query {
  todo(id: $todo_id) {
    id
    name
    description
    assignee {
      id
      name
    }
  }
}
`;

const stateToComputed = (state, { todo_id }) => {

  const { data: { todo }, status } = cashay.query(todoQuery, {

    variables: { todo_id },

    preFetch(initialResult, cache) {
      const cached_todo = cache.find('Todo', todo_id);
      if (cached_todo) {
        initialResult.todo = cached_todo;
        return initialResult;
      }
    },

    postFetch(finalResult, serverResponse) {
      // possible manipulation here, not strictly necessary
    }

  });

  return { todo, isLoading: status === 'loading' };
}

const TodoDetail = Component.extend({
  layout: hbs`

  <h1>{{todo.name}}</h1>
  <hr>
  {{#if isLoading}}
    <em>Loading...</em>
  {{else}}
    <div>Description: {{todo.description}}</div>
    <div>Assignee: {{todo.assignee.name}}</div>
  {{/if}}

  `
});

With such an API, the component will be able to display the name of the todo, while the rest of the content is loading.

Drawbacks

This opens the door for possible cache misuse and anti-pattern.

Alternatives

Modify @cached instead

Potentially, @cached could be modified to implement the same flow above, where it returns cached data, but fires a server request anyway if the cached data is not found. This would need to account for missing subfields—e.g. if the entire requested shape does not exist in cache, the server must be consulted.


additional context in #154

dustinfarris commented 7 years ago

Updated title and description. /cc @mattkrick @jordanh

jordanh commented 7 years ago

Intriguing proposal @dustinfarris. A suggested name for such a feature "progressive querying." I'll let met respond with his thoughts before adding mine...

mattkrick commented 7 years ago

I think the goal of cashay is to have a declarative API (except for mutations, but that's for v2). In other words, you should tell it what you want to do, not how to do it. Declaratively, you want data. You shouldn't have to sorry how that's achieved. Unfortunately, to do that means we'd have to make certain assumptions about the user defined queries on the server (like how they're named), or we'd have to be more verbose.

dustinfarris commented 7 years ago

@mattkrick ok, if adding lowish-level hooks are a no-go (à la v2), then I think my alternative proposal should be considered, perhaps in a separate issue.

In a nutshell: modify existing @cached to be smarter about existing data and allow the user to specify fallback behavior in case some or all of the data is missing.

Something like:

query {
  todo @cached(id: $todo_id, type: "Todo", refresh: true) {
    id
    name
    assignee {
      id
      name
    }
  }
}

:point_up: adding "refresh: true" instructs Cashay to "give me whatever you have for the resolved id, but kick off a server request and update the information". To your point, @mattkrick, this assumes that the name of the field, todo in this case, matches the name of the server-side field to be queried.

From my understanding, current @cached allows you to name the base field whatever you want, e.g.

getTodoFromCache @cached (id: $todo_id, type: "Todo") {

so it would be necessary to either make this illegal (doubt anyone would go for that), or require the user to specify the server query field explicitly if it is not the same, e.g.

getTodoFromCacheAndServer @cached (id: $todo_id, type: "Todo", refresh: true, field: "todo") {

in the absence of field:, it is assumed that the field name is the same

todo @cached (id: $todo_id, type: "Todo", refresh: true) {
mattkrick commented 7 years ago

if we know the name of the query, then do we even need to use a directive? Why not just make it a standard query & then diff out the fields that we already have? If we have all the fields, then we don't send a request to the server.

dustinfarris commented 7 years ago

Yep, that works too.

The goal, though, is to use data that potentially came from other queries—you mentioned in another issue something about not making assumptions about type/id uniqueness. I'm still fuzzy about what you meant by that.

Yes, ideally I would love for Cashay to "just do it" when I write,

query {
  todo(id: $todo_id) {
    id
    name
    description
    assignee {
      id
      name
    }
  }
}

:point_up:

mattkrick commented 7 years ago

here are the embedded assumptions in that:

These assumptions are all probably safe, heck relay makes these assumptions & that's why you write relay queries in such a funny way... the relay server basically has an id that is a union of all possible types. Their GUID is a base64 composite key of the id & the type, so you don't even need to repeatedly write a bunch of getXById queries, the helper on the server writes those for you. Unfortunately, this assumes you have access to change the server. That server could be set up by another person/department/company, which means you don't have the right to change it.

dustinfarris commented 7 years ago

The difference here is that Cashay knows in advance what Type will be returned. Relay and others do not. In my view, this is a clear differentiator that makes those assumptions plausible, without any goofy GUIDs. Server access not required, only the schema.

mattkrick commented 7 years ago

definitely, i hear ya, it's a solid proposal. what would you propose to do about arrays of todos? For example let's say on 1 page you have a getTop50Todos. Then, on another page you wanna query 5 of those. would you have 1 query that takes in an array of ids or would you call the same todo 5 times?

dustinfarris commented 7 years ago

Interesting question. My gut reaction is the (optional) list of ids.

type Root {
  todos(ids: [ID]): [Todo]
}
query FiveTodos($todo_ids: [ID]) {
  todos(ids: $todo_ids) {
    id
    name
    description
  }
}

Aside: There are other caveats such as interfaces and unions where it may not be easy to determine to exact Type. I'd say the hard-and-fast rule should be: "we will try, but if we can't resolve the Type and id, we're going to the server". We can potentially add more tools here in the future, but I think sticking to known type+id is a good starting point.

dustinfarris commented 7 years ago

"other tools" might include a way to specify which Types have UUIDs when cashay is initialized.

e.g.

cashay.create({
  types_with_uuid: ["Todo", "Project"]

and then use that to decide whether uniqueness can be inferred from a particular Union query. If so, great. If not, existing operation/key in cache? Ok, still great. Otherwise, server.

But I digress...

mattkrick commented 7 years ago

@cached already handles unions, although i'm not sure how well...works for me at least :smile:

dustinfarris commented 7 years ago

tl;dr the abbreviated proposal is:


If the Object Type and ID of a query can be determined in advance, Cashay will attempt to find existing data in cache, and backfill missing subfields with a server request.