Reduce fetch for an array of IDs

mattkrick commented 8 years ago

Per the multi-part queries recipe, it's great to have the first operation grab an array of IDs, and the second one fetch those IDs.

What isn't great, is that when that list of IDs changes, that result is no longer found in the cashayDataState.result. This is silly, if I remove an ID from an array, I still have all the info needed. Similarly, if I add an ID, I should only send the diff to the server.

Basically, the same pattern used for pagination should be used for an array of IDs. Just like how pagination uses reserved words to detect the arg type, we can duck type arrays of IDs like this: new GraphQLList(GraphQLID).

The log is as follows:

If all args are equal except for the idArray
- create a new result entry
- remove the intersection between the new and old
- send the request for the new ones to the server

mattkrick commented 8 years ago

This really comes down to whether or not a fetch is for a pure document. If the only variable passed in is an ID (or array of IDs), then we should be able to look directly to the entities instead of bailing when the result is blank.

So when denormalizing the store:

if the only arg is an ID && pure == true
then fill in result by populating the result withEntityName::$ID
Now the recursion can kick in & try to grab what we need.
If the localOnly flag is set, then we can guarantee we won't make a call & will just wait for the source (probably a sub) to fill the data.

In practice, I might sub to an array of teamMembers:

subscription ($teamId: ID!) {
  teamMembers(teamId: $teamId) {
    id
    name
    picture
  }
}

Then , in a grandchild component, I might care about a specific team member:

query ($teamMemberId: ID!) {
  teamMember(teamMemberId: $teamMemberId) {
    id
    name
  }
}

This query should be free, since I know my subscription has (or will) supply this data to the redux store. This solves the colocation problem because now I don't need to parse out a particular teamMember from the subscription array & pass it down. It answers the question, "why should i normalize my subscription data?".

@jordanh does this make sense?

mattkrick commented 8 years ago

I can't quite visualize it, but I think this pattern should be user-expandable. So a user can enter an auto-resolve function if they like. Maybe each query has an options.localResolve function.

Let's take Tasks, which can either be a Project or Action.

I should be able to write something like:

query($type: STRING!) {
  getTasks(type: $type) {
    id
    content
    owner
  }
}

and then somewhere have a local resolve function:

const resolve = (denormalizedTasks, args) => {
  return denormalizedTasks.reduce((res, task) => {
    if (task.type === args.type) {
      res.push(task);
      return res;
    }
  },[])
}

This is basically what minimongo did in meteor, except now you're not constrained to mongo syntax, you have the whole javascript world to decide how to handle it.

We could replace localOnly flag with localResolve function.

mattkrick commented 8 years ago

couple days later, i still really like this idea.

It solves this problem:

query {
  getBestPost {
    id
    content
  }
}

data = { getBestPost: { id: '123', content: 'hi there' } }

query {
  getPost(id: "123") {
    id
    content
  }
}

data: ????

In the above example, we know what post 123 looks like & we have everything we need to fulfill the request! BUT, we don't know what's going on in the server resolve function. maybe we i request id: 123 the server does a String.reverse() and returns the values for 321. That's messed up, but Cashay ain't gonna judge.

so, if you know that your server actually does something completely stupid like reversing the ID in the above example, i'd want to write something like:

resolve = (stringify, args, posts) => stringify(args.id.reverse());

behind the scenes, since we have the schema on the client, we know that getPost is going to return an entity of type Post. So, all stringify does is a simple string concat: Post::123.

Now let's step up the complexity and assume that a Task is a UNION of Project and Action. Furthermore, we want to return a result based on the isActive field, not an id:

query($isActive: Boolean!) {
  getTasks(isActive: $isActive) {
    id
    content
    owner
  }
}

resolve = (stringify, args, tasks) => {
  /*
  tasks = {
    project: {
      123: {},
      456: {}
    },
    action: {
      678: {}
    }
  }
  */
  const filteredResult = [];
  Object.keys(tasks).forEach(concreteType => {
    Object.keys(concreteType).reduce((reduction, task) => {
      if (task.type === args.type) {
        reduction.push(stringify(task.id, concreteType));
      }
      return reduction;
    }, filteredResult)
  };
  return filteredResult;
}

The one thing this doesn't account for is how to compare types when those types have arguments. This is easy enough, you just stringify the args that go into it & get the result, but I'm betting that is gonna be really rare, so it doesn't make sense to turn it into a helper.

mattkrick commented 8 years ago

Another use case: Query from a subscription feed. If you know the ID you want, it's easy to snag it out of the entities state. But, if you don't know what items you want (for example, if they came from a subscription) then you'll need to get that somehow.

screen shot 2016-08-12 at 11 25 24 am

If this is a picture of the normalized items that came in from your subscription, we'll need to be able to get that from the resolve function.

maybe it looks like

resolve(stringify, args, tasks, getResult) {
  agendaArgs = {teamId: args.teamId};
  // denormalizes the result
  const result = getResult('agenda', agendaArgs);
  return result.sort((a,b) => a[args.sortField] > b.[args.sortField])
}

mattkrick commented 8 years ago

Now that we have the live, localSort, localFilter directives, it makes sense to mark this as a local directive.

It'd be lame to have to make a query (eg getProjectById) on the server & only use it locally. Relay solves this problem with their node union that takes a GUID that is a base64 of the type & id. I think that's ugly.

instead, if a Type is accompanied by the @local directive then we we'll just look in entities where entity === Type. The id can come from an argument, if it's supplied to the type argument, or it can come from the parent, if it's supplied to @local.

Ultimately, it'll look something like this:

query {
  project: Project @local(id: $projectId) {
    content
    id
    status
    teamMemberId
    updatedAt
    teamMember: TeamMember @local {
      id
      picture
      preferredName
    }
    team: Team @local {
      name
    }
  }

Options would look like:

const options = {
  local: {
    teamMember: (source, args) => source.teamMemberId,
    team: (source, args) => source.id.substr(0,source.id.indexOf('::'))
  }
};

It'd be fantastic to just do @local(id: 'teamMemberId'), but that isn't flexible enough for team, since that value is computed. We could force people to include it as an arg, but we want this to work for arrays, too. maybe @local(ids: ['123', '456'])? Notice the arg is ids. In true GraphQL fashion, I think it's important for each arg to only accept a single type; in this case, ID! or [ID!]!.

Initially, I put the argument on the field itself, but I don't like that since the field (eg TeamMember) doesn't exist in the schema. Furthermore, their field may use _id, so putting id on a field that mocks the server schema feels wrong.

@jordanh thoughts? This is going to be applied to ProjectCardContainer & will remove a LOT of junk in its mapStateToProps.

mattkrick commented 8 years ago

come to think of it, I like using a single @local directive instead of localSort, localFilter, local. So, the API looks likes this: @local([id] [, ids]). And the options would look like this:

const options = {
  [directive]: {
    [aliasOrFieldName]: {
      resolve: (source, args) => source.teamMemberId,
      sort: (a, b) => a.foo < b.foo,
      filter: (field) => Boolean(field.content)
    }
  }
};

Doing this makes the options 3 levels deep, which sucks, but that's what @live uses: [directive][aliasOrFieldName][function], so at least the API is easier to remember. From a higher level view, it cleans up the query string by reducing the # of directives that a single field can have. Additionally, I don't see much value in have a @localSort @localFilter suffix on a field. All i need to quickly know is if anything magical is happening locally, I can read the options for the details on what.

mattkrick commented 8 years ago

There are 5 things that a person might wanna do:

sort
filter
localLookup
resolve a field that doesn't exist in the schema
transform (eg mapreduce)

The last one is the most interesting. For example, if I get a bunch of team members, and projects from each team member, I might want to group by project status instead of by teamMemberId. While this could be really powerful, it'll muddy up the GraphQL query string by introducing something like a @transform directive, allowing someone to pass in a function on how to transform it. So, I think we can rule that one out of scope.

The second from last is next interesting, where it'd be cool to add a calculated field to the schema & specify a function to generate that calculated field. Relay 2 calls this "client fields", although I don't know if they can be computed there. MobX calls it @computed. This will have to be computed via a function. If it's done inside cashay, then memoization isn't required. The problem is that GraphQL is a tree & peer branches can't talk to each other. This functionality might be a computation derived from looking at all peers. For example,isLeader might look at each teamMember, then rank them, and decide who the leader is. If this can't be eloquently done for all common use cases, we might as well not do it at all.

So out the 2 fun ones, we're left with the previous 3 of sort, filter, localLookup. The idea of sticking the logic for all 3 under a single @local directive is clean, but there's one small difference between sort, filter and lookup. And that is that the sort & filter fields will still get sent to the server, if we want. Consider pagination. We get the first 10 docs, unsorted. Would it be useful to sort those? Probably not, since we'd want the whole data set. So, let's say we had the full data set, all 100 documents, and we requested them sorted 1 way, and now we want them sorted another way. The localSort would be useful since it'll grab that list that was previously cached with the sort args, and then sort them afterwards on the localSort criteria. This gets a little convoluted since a forceFetch should still send off that local request, whereas if it was locally resolved, then it wouldn't. Main point-- Looking just at the query string, @local does not tell me if the field could go to the server or not. So, sort & filter are the same, but locally looking up something in inherently different since it is guaranteed to be excluded from the server request.

So, we could have @live, @local, @entity. @live, @entity would be excluded from server calls. @local would include sort, filter options and the field could go to the server, although the directive would be removed before it's sent off. This allows us to expand, for example if we really wanted to make a client field, we could put a local.resolve on it to resolve very similar to how GraphQL does it on the server.

Then, we've got @entity. Similar to how @live works, this field is auto removed from the server query. This is still awkward though because if it's a top-level item, we'll need to pass in id or ids to lookup. If it's a child, those will be supplied by the parent. We should try to follow the GraphQL pattern, which deals with this exactly. On the top level, GraphQL takes in an arg & uses a resolve function to return the value based on that arg. On the next level down, you could use another arg, or you could use the source & a custom resolve function.

That default resolve function would use the field name from the field, and then the id field would be custom resolved from the value of what was passed in. If it was an id, it'd be an Object. if it was an ids, it'd be a list of objects. The tricky part is where to put this stuff. We have an entity and id that we need to pass in. Currently, we pass in the entity as the fieldName and the id and an argument that could go on the directive @entity(id: $id) or on the field itself TeamMember(id: $id). If we put it on the field itself, there's really no need for the @entity directive. That's no good, since someone could leave it off by mistake & we'd have to put a warning there saying "Hey, did you forget to add the directive?". So, we've got options:

teamMembers: TeamMember(id: $id) @entity
teamMembers: TeamMember @entity(id: $id)
teamMembers(id: $id) @entity(type: "TeamMember")
teamMembers @entity(id: $id, type: "TeamMember")
teamMembers(id: $id, type: "TeamMember") @entity

jordanh commented 8 years ago

Feedback from a user: the difference between @entity and @local wasn't grokked by me until the third time I read through the above. It wasn't quickly clear that @entity meant "grab from local cache, never sending to server" and that @local meant "sort or filter from local cache, potentially after sending server request"

I think it's because the name @entity just isn't descriptive enough. What went through my mind was, "entity? what's an entity? isn't everything an entity, man?"

If you're open to a change, some thought-starters:

Rename @local to:

@list or,
@localList (reminiscent of GraphQLList) or,
@collection

...because I think today's @local always performs an operation on a list of things.

Then rename @entity to:

@cached or,
@local

...because today's @entity is always acting locally.

I think something like this reads well:

  teamMembers @cached(ids: $ids, type: "TeamMember") @list

To me, that says: pull TeamMembers using $ids from local cache only then apply local list operations using the sort and/or filter options I specify.

mattkrick commented 8 years ago

@jordanh really good advice. I like @cached. I think the feature is good enough to share the almost-namesake of the project. I've refactored, it's good to go.

@list still feels a little ambiguous, but it may be the best. I kinda like the word transform, but that doesn't apply specifically to a list. Funny enough, since neither take an arg, it would be possible to do away with the directive altogether & just let folks pass in the option, although I'm not sure if that crosses over into being magical.

Also, i've refactored options:

const options = {
  directives: {
    [fieldName]: {
      sort,
      filter,
      subscriber,
      resolveChannelKey,
      resolveCached,
      resolveCachedList
    }
  }
}

By grouping all the directives together, the options object looks a lot smaller & you can see everything that's going on for a single field.

jordanh commented 8 years ago

👏 🎊

Awesome! I really like the options interface. Very good. Very flexible. Bravo!

I almost suggested just dropping the directive for sort and filter. Intuitively, I think this is a good idea.

The way to get to the bottom of it philosophically might be: what's the checklist one would need to check off to have a feature qualify as a directive?

[ ] not always passed to server
[ ] takes argument
[ ] ???

@cached is an obvious directive. But, how about @live? It needs to be a directive. But why? What quality would be added to the above list?

mattkrick commented 8 years ago

@jordanh Per the definition:

Directives provide a way to describe alternate runtime execution and type validation behavior in a GraphQL document.

@live: pulls from schema.subscriptions instead of schema.query. It's a directive

@cached: polyfills the schema with a given type & local resolve. It's a directive

@list: transforms an array after the runtime execution, but before the caching. Uf. This is a tough one.

mattkrick commented 8 years ago

Something that causes a lot of errors for me is the resolveCached and resolveCachedList. They're mutually exclusive, so I don't think we should have both.

Instead, I think type should use GraphQL shorthand to declare a list.

query {
  editors @entity(id: $projectId, type: "[Presence]") {
    id
    teamMember @entity(type: "TeamMember") {
      preferredName
    }
  }
}

and the options:

directives: {
      editors: {
        resolveCached: (source, args) => (doc) => doc.editing === `Task::${args.id}`
      },
      teamMember: {
        resolveCached: (source) => {
          const [, teamId] = source.editing.split('::');
          const {userId} = source;
          return `${userId}::${teamId}`;
        }
      }
    }

Now we just have resolveCached. Half the complexity!

Additionally, teamMember above returns a string instead of a function. that string is equivalent to

 (doc) => doc.id === `${userId}::${teamId}`

Internally, this reduces the lookup from O(n) to O(1). However, it introduces confusion to the API. Internally, I'll have this, but not sure I'll document it...

mattkrick commented 8 years ago

final problem: should type accept a UNION type? if it does, then it's great for the user, since it makes for an easy API. HOWEVER, passing in id with type: Creature fails entity integrity if Dog:123 and Person:123 both exist.

To do it right, the type would have to be calculated for each doc. Passing in a concrete type via arg won't work since it could vary for a list.

However the id, ids is calculated, they should also know the concrete type. If there is no id, ids and just a resolveCache function, then that function will have an outer context containing a concrete type & we can use that. So, the only problem is for lookups that involve a UNION type with an id that has a collision. Using the id, ids without a resolveCached is just a default function for simple cases. having a colliding ID in a UNION is not simple, so it should be OK to stick with how it is.

mattkrick commented 8 years ago

closed with v0.20.0

mattkrick / cashay

Reduce fetch for an array of IDs #89