References/Links - Githubissues

richburdon commented 7 years ago

Consider the following Project->Task structure (ignoring for the moment the possibility of tasks belonging to multiple projects).

type Project implements Item {
  team: Group!
}

type Group implements Item {
  members: [User]
}

type Task implements Item {
  project: Project
}

[OPTIMIZATION ISSUE: If we just want to retrieve the project ID, can we do this in the resolver without having to retrieve the Project record (we already have the ID value in the Task record -- should we proactively parse the query shape to determine if the actual record should be retrieved?) Is this moot once we have links?]

In the ProjectCard, we could do this:

query ProjectQuery($itemId: ID!) {
  item(itemId: $itemId) {
    team {
      members {
        tasks(filter: { expr: { field: "project", value: $itemId } }) {
          title
        }
      }
    }
  }
}

And/or implement Project->Task hierarchies (aka Item "compositions") via parent->child references.

type Project implements Item {
  team: Group!
  tasks: [Task]!
}

And filter these by member/assignee in the renderer.

At some point we'll also need to be able to reference links in the filter.

adamberenzweig commented 7 years ago

Does ignoring the possibility of tasks belonging to multiple projects simplify anything? If not we should solve the more general many-to-many problem.

For posterity, I'll enumerate the 4 types of composition we discussed the other day:

Structured inlining.
Reference by key.
Link as first-order relation object.
Query embedded in parent.

The schema you wrote above represents the relationship directly (2), but the query you've sketched is more like (4).

For discussion here's an approach that builds on top of a relational model underneath (imagine the resolvers use sql tables underneath to represent the relationships between projects, tasks and users):

type Item {
  ...
  links: [Item]
}

type Task implements Item {
  assignee: User
  owner: User
}

type Project implements Item {
  // nothing special here... all relations will be links.
}

query projectQuery($itemId: ID!) {
  item(itemId: $itemId) {
    links {
      ... on Task {
        assignee: { ... UserFragment }
        owner: { ... UserFragment }
      }
    }
  }
}

When the client wants to build a page that groups tasks by assignee, it has to sort them into user buckets at the client -- the query response is not structured that way. But the shape of the query is simple and general.

During resolution, first the server would fetch a project by ID and retrieve the item keys for all links, then as it went down the response tree each linked item would be resolved in turn.

Have to sort into user buckets at the client
- This seems to make mutations more straightforward too, since the client logic to sort would be re-triggered when new tasks are added to the project. [?]
N+1 fan-out at the server. Better than this fan-out happening at the client (saves network round-trips). Probably can be optimized by indexing tasks by project ID and doing that query while resolving the project, rather than waiting for the resolution framework to do it as it goes down the tree.

Any other problems w/ this approach?

richburdon commented 7 years ago

Let's give examples for the four cases:

Structured inlining: these are not first-order items. E.g., A Contact has 3 email addresses (which are potentially schema types). A Calendar event has 3 inline tasks (that could be "snapped off" into first-order items at some point.
Reference by key: A user has a Contact item (which is the aggregated form of contact records (inlined as above) from multiple external sources (FB, Google, LinkedIn)
First order links: a) bi-directional many-to-many (e.g., project->user) where we want to be able to efficiently navigate in either direction; b) cases where we need to store metadata on the link itself (e.g., ACL).
Embedded query: here I meant that the time has specific semantics (e.g., Event has participants and observers). Or that it might have "context" specific items (e.g., "important tasks" based on location, time, etc.)

I think Schema is very powerful for the following reasons (relative to a homogenous system):

It provides opinionated constraints that are often useful (e.g., you can add a note to anything, but you can't add an event to a task). This may be important for 3rd party developers.
It simplifies queries (not just not having to write filters everywhere), but also the prefix sub-query syntax (that BTW is currently broken in Apollo), and the paging issue. We could work-around this by binding multiple top-level queries and batching them (in Apollo batching works by waiting 100ms).
Cardinality: if a Project has an assignee and that is represented as a Link, then what stops you from attaching a Banana as the assignee. Are links only for many-to-many?
Documentation/clarity. This assumes we want to be opinionated. Against my generalist platform-building instincts I think we should be. I think this is what will make the app tractable.

I don't understand your SQL analogy. Typically SQL have type-specific tables and references. I don't think the different models really affect the backend implementation.

However...

Against schema:

Anything goes is more flexible.
I'll have to give more thought to whether or not this would complicate reducers, etc. My gut feeling is it will make them more complicated to write, but it could have benefits: i.e., the query may be more transparent to introspect and allow us automatically to patch the cache based on running the matcher on each branch. If we can make reducers completely automatic, then this would be a good argument for heterogeneity (i.e., there would be no opaque resolver logic needed in the client). Although it wouldn't resolve the context issue (i.e., there are always going to be opaque queries).

I think you client side sorting "problem" isn't relevant: you can still do nesting in the query if that's useful to you. It won't be as clear to traverse (e.g., items.items.items.items) and I'm not sure what the traversal syntax would be for "members" vs "participants" things.

I think in summary the trade-off are: For Schema: clarity, constraints, simplicity, contextual matching (i.e., not expressed purely by filter). Against: Flexibility, possibility of client side holy grail caching (offline).

richburdon commented 7 years ago

My comment about making resolvers automatic still would have issues. There is always going to be some opaqueness in the queries (context, ACL, the current concept of refs for nested queries, etc.) So for offline we will need to some extent a shallow implementation of the resolvers. I think of these as stored procedures.

Let's compare some real life queries side by side.

adamberenzweig commented 7 years ago

Agreed, assignee vs owner is a good canonical example of the need for some type-specific schema.

Agreed that using schema with nice field names to capture relationships makes deep nested queries easier to write, e.g. project.tasks.assignee.name instead of something with a bunch of filters.

For the record here's some issues that I believe motivate this discussion:

Ability to nest projects, infinitely
Ability to capture many-to-many relationships
Ability to add notes, tasks, documents (search results), and other projects to a project.
Easily moving items between projects

Some questions related to the links between Projects and People:

Are the links between projects and people also used to compute ACLs? (related to #24 )
If I follow a project, can others see that fact? (proposal: Yes, if others can see the project)
What determines the set of people for whom we show task sections on a card? a) explicit "team" field pointing to a Group b) implicit set of all people assigned tasks that are linked to this project, c) set of all followers linked to this project. (Proposal: foo)

Worth noting two issues w a very generic approach that we identified when chatting:

Heterogeneous pagination, e.g. paginating separately through tasks vs notes attached to a project. But we can handle with multiple nested queries with different filters, something supported by graphql. e.g.:
```
item(itemid: $itemId) {
notes: links(type=Note, offset=...) { NoteFragment }
tasks: links(type=Task, offset=...) { TaskFragment }
}
```
The assignee vs owner problem (same as your "participant vs observer" example). Ie. differentiating by pure link semantics that are not captured anywhere else in the object data. The only generic solution that comes to mind is first-class Link objects with a type field -- like freebase subject, predicate, object triples.

My comment about SQL was really about where the relationship is expressed: In our current approach, the interesting relationships are represented by embedded queries (type 4 in our typology) that the client needs to express. E.g.

tasks(filter: { expr: { field: "project", value: $itemId } }) or tasks(filter: { expr: { field: "assignee", ref: "id" } } AND project=this (paraphrasing)

that feels awkward to me -- the relationship between tasks and the parent project should already be known, not expressed again in the filter of an inner query that the client has to write.

richburdon commented 7 years ago

Consider:

Event => Project(label=eng) => Task(status=blocking)

Query:

{
  project(label: eng) {
    tasks(status: blocking)
  }
}

{
  links(type:Project, label:eng) {
    ... on Project { ... }
    links(type:Task, status:blocking) {
      ... on Task { 
        assignee: { 
          name 
        } 
      }
    }
  }
}

Issues:

when to use generic "links" vs named "verbs" (e.g., assignee)
need for back (root/context) references for nested queries

adamberenzweig commented 7 years ago

Project is a collection of stuff
Some types have named fields when can't distinguish, e.g. assignee vs owners.
Already have TODO in code to move tasks up into projects, not have to go through members.

type Project {
  notes(PaginationFilter): [Note]
  tasks(PaginationFilter): [Task]
  projects(PaginationFilter): [Project]
  documents(PaginationFilter): [Document]
}

vs


type Project {
  links(FilterInput): [Item]
}

richburdon commented 7 years ago

Or:

type Item {
  links(relation, filter)
}

adamberenzweig commented 7 years ago

Also see: https://docs.google.com/document/d/1qMwaBb8jcip1zZipvFIM5SZDDE07ZL5Lpz0UtM1dH-I/edit#heading=h.8b19te3y0jvi

minderlabs / demo

References/Links #32