This is an outline of how to go about implementing a version of the functionality described in tair/ifad-frontend#22.
Current functionality
Queries in ifad-backend are currently created by choosing a list of "segments" (pairs of Aspect and Annotation Status), and applying either the AND (intersection) or OR (union) operator over the entire list of segments. Because of this, some richer queries that might be desired are not possible.
Example desired query
A query that we would like to be possible but is not currently might look like this:
Fetch all genes and annotations which belong to both:
Known Experimental in Molecular Function, AND
Unknown in Biological Process OR Unannotated in Biological Process
Query Structure
In order to support richer queries like the one described above, we need a representation that describes query as a Tree of operations, rather than a List as it is currently. In code, a query in the current system is a single operator and a list of segments, where the single operator is used to join all of the segments, like this:
To support richer queries, we'd like to be able to choose a new operator to combine different components of the query. Continuing with the use case above (segment(F, EXP) INTERSECT (segment(P, UNKNOWN) UNION segment(P, UNANNOTATED)), here's how the queries could be restructured to support it:
Notice that this query is essentially a tree with three types of nodes: a "segment" leaf node which queries a single segment, and "union" and "intersect" nodes which have children in the components field.
Implementing the new query structure
Implementing this new query should not be difficult because the query functions were defined in a modular way from the start. Let's look at the key types and functions (from queries.ts):
// Notice that the shape of query outputs is the same shape as the input
type QueryResult = StructuredData;
const querySegment = (dataset: StructuredData, segment: Segment): QueryResult => { ... }
const union = (one: StructuredData, two: StructuredData): QueryResult => { ... }
const intersect = (one: StructuredData, two: StructuredData): QueryResult => { ... }
The union and intersect functions make it easy to take any two subsets of data and combine them. This can be used to create a tree traversal over the query tree, combining the children of each node according to the operator specified by that node.
Aside: Gene Product Type filters
One last thing I'll mention here is that the queries I described here ignore the detail of gene product type filters. There are two possible choices that could be made in this regard:
Apply gene product type filters to the total query, or
Allow gene product type filters to be chosen for each level of the query.
The second option is more true to the idea of supporting richer queries, and should not take considerably more work to implement. It would just require adding a filter field into the nodes of the query tree and taking that into account when traversing the tree.
This is an outline of how to go about implementing a version of the functionality described in tair/ifad-frontend#22.
Current functionality
Queries in ifad-backend are currently created by choosing a list of "segments" (pairs of Aspect and Annotation Status), and applying either the AND (intersection) or OR (union) operator over the entire list of segments. Because of this, some richer queries that might be desired are not possible.
Example desired query
A query that we would like to be possible but is not currently might look like this:
Query Structure
In order to support richer queries like the one described above, we need a representation that describes query as a Tree of operations, rather than a List as it is currently. In code, a query in the current system is a single operator and a list of segments, where the single operator is used to join all of the segments, like this:
Applying the query goes something like this:
In code, this looks slightly different, but the result is the same.
Proposed new query structure
To support richer queries, we'd like to be able to choose a new operator to combine different components of the query. Continuing with the use case above
(segment(F, EXP) INTERSECT (segment(P, UNKNOWN) UNION segment(P, UNANNOTATED))
, here's how the queries could be restructured to support it:Notice that this query is essentially a tree with three types of nodes: a "segment" leaf node which queries a single segment, and "union" and "intersect" nodes which have children in the
components
field.Implementing the new query structure
Implementing this new query should not be difficult because the query functions were defined in a modular way from the start. Let's look at the key types and functions (from queries.ts):
The
union
andintersect
functions make it easy to take any two subsets of data and combine them. This can be used to create a tree traversal over the query tree, combining the children of each node according to the operator specified by that node.Aside: Gene Product Type filters
One last thing I'll mention here is that the queries I described here ignore the detail of gene product type filters. There are two possible choices that could be made in this regard:
The second option is more true to the idea of supporting richer queries, and should not take considerably more work to implement. It would just require adding a
filter
field into the nodes of the query tree and taking that into account when traversing the tree.