graphile-contrib / postgraphile-plugin-connection-filter

Filtering on PostGraphile connections
MIT License
285 stars 32 forks source link

Compare two columns inside filter #113

Open zaquas77 opened 5 years ago

zaquas77 commented 5 years ago

Hello! Is it possible compare two columns inside filter?

I would like do something like that:

[...] filter: { column1: { greaterThanOrEqualTo: column2} } [..]

mattbretl commented 5 years ago

Interesting. I don't believe that's been requested before. The main barrier to implementation is the fact that GraphQL doesn't support input unions (yet).

If we assume that column1 represents an Int field, then the input type of the greaterThanOrEqualTo field would also be Int, but you're wanting the ability to specify a different input type; most likely an Enum value (e.g. COLUMN_2). Since we can't use an input union, we can workaround it by wrapping the two types in an ObjectType with two fields, and then only specify one of them at query time, but that would introduce a lot of baggage for this one feature.

Alternatively, we could introduce additional fields specifically for these comparisons, perhaps with Column appended to the field name, so a query would look like:

[...] filter: { column1: { greaterThanOrEqualToColumn: COLUMN_2} } [..]

What do you think about that approach? It would be opt-in (perhaps even packaged as a separate plugin) to avoid cluttering the default schema.

higherorderfunctor commented 5 years ago

I would also use this feature. I am not sure how viable it is in GraphQL, but it would be great to see feature parity (for lookups at least) with Django's F expressions, which includes forward and reverse related fields.

As a real-world example, say I wanted to generate a report of all devices with a configured management IP address that is not found on any of that device's configured interfaces. I can currently write that query using Django's ORM:

# django/python example
(
    Device.objects
    # config has a relation to device
    # interface has a relation to device
    .filter(~Q(configs__ip_address=F('interfaces__ip_address')))
)

However, I do not see a way that my end users could craft such a query in GraphQL for this kind of report without requiring the back-end team to setup a database function or (materialized) view.

# possible future example in GraphQL?
{
  devices(filter: {
    not: { configsByDeviceId: { every: {
      ipAddressEqualToFieldExpression: {
        interfacesByDeviceId: { some: { ipAddress } }
      }
    } } }
  }) {
    ...
  }
}
zaquas77 commented 5 years ago

First of all, thanks @mattbretl and @higherorderfunctor for yours fast response! I didn't think that my question would create a so intersting topic!

I am not a very expert of Graphqhl, but i think that it is a "declarative" way rather that "imperative" one for make questions. From the client side I don't want say how create a "data", I want only ask for a "data" in a certain form. In this way I found just strange the exist about filters. With the filter, and with that I ask (comparation beetwen columns), from the perspective view of a client, is the client that inform the server how to retriew the data, but in this perhaps we force graphql to do things that it not be build for. If I understand well, graphql is a layer beetwen the data and the consumers of those... behind graphql could be a db, a api, a legacy system, so for grapqhl it would be strange that client would says how retriew the data. Nevertheless it would be amazing! :) Almost for little "rules" (like I ask for). So like @mattbretl suggest I think the solution should be a opt-in... and sure I would be grateful!

Thanks @mattbretl and @higherorderfunctor for your support.

mattbretl commented 5 years ago

@higherorderfunctor In general, I would recommend a backend solution for that (either a PostGraphile computed column or a trigger-populated column, depending on performance needs), but I'm willing to take a closer look.. can you post a minimal SQL schema that we can use for discussion? Just want to make sure I'm following your use case.

@seeiuu I agree. Obligatory reference: https://twitter.com/leeb/status/1004655619431731200 If you don't intend to expose extensive client-side filtering capabilities, then you don't need (and probably shouldn't use) this plugin. That said, it does solve a real problem when you need it (e.g. admin dashboards and faceted search). Also, if you pair this plugin with query whitelisting, you can retain full control over whether the necessary database indexes exist, avoiding "slow queries." While GraphQL wasn't specifically designed for this use case, I still prefer to use it over alternatives like OData.

zaquas77 commented 5 years ago

@mattbretl I agree too! Happy to have understood something! Said that, you think that this kind of request will become part of this project?

thank in advance

mattbretl commented 5 years ago

Tentatively yes, though I would publish it as a separate package with a dependency on this plugin. For v2, I plan to move several features into separate plugins.

higherorderfunctor commented 5 years ago

@higherorderfunctor In general, I would recommend a backend solution for that (either a PostGraphile computed column or a trigger-populated column, depending on performance needs), but I'm willing to take a closer look.. can you post a minimal SQL schema that we can use for discussion? Just want to make sure I'm following your use case.

postgres=# \d+ polls_a
                                                Table "public.polls_a"
 Column  |  Type   | Collation | Nullable |               Default               | Storage | Stats target | Description 
---------+---------+-----------+----------+-------------------------------------+---------+--------------+-------------
 id      | integer |           | not null | nextval('polls_a_id_seq'::regclass) | plain   |              | 
 a_field | integer |           | not null |                                     | plain   |              | 
Indexes:
    "polls_a_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "polls_b" CONSTRAINT "polls_b_a_relation_id_3d42cf4e_fk_polls_a_id" FOREIGN KEY (a_relation_id) REFERENCES polls_a(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "polls_c" CONSTRAINT "polls_c_a_relation_id_1cca00c9_fk_polls_a_id" FOREIGN KEY (a_relation_id) REFERENCES polls_a(id) DEFERRABLE INITIALLY DEFERRED

postgres=# \d+ polls_b
                                                   Table "public.polls_b"
    Column     |  Type   | Collation | Nullable |               Default               | Storage | Stats target | Description 
---------------+---------+-----------+----------+-------------------------------------+---------+--------------+-------------
 id            | integer |           | not null | nextval('polls_b_id_seq'::regclass) | plain   |              | 
 b_field       | integer |           | not null |                                     | plain   |              | 
 a_relation_id | integer |           | not null |                                     | plain   |              | 
Indexes:
    "polls_b_pkey" PRIMARY KEY, btree (id)
    "polls_b_aRelation_id_e3adf557" btree (a_relation_id)
Foreign-key constraints:
    "polls_b_a_relation_id_3d42cf4e_fk_polls_a_id" FOREIGN KEY (a_relation_id) REFERENCES polls_a(id) DEFERRABLE INITIALLY DEFERRED

postgres=# \d+ polls_c
                                                   Table "public.polls_c"
    Column     |  Type   | Collation | Nullable |               Default               | Storage | Stats target | Description 
---------------+---------+-----------+----------+-------------------------------------+---------+--------------+-------------
 id            | integer |           | not null | nextval('polls_c_id_seq'::regclass) | plain   |              | 
 c_field       | integer |           | not null |                                     | plain   |              | 
 a_relation_id | integer |           | not null |                                     | plain   |              | 
Indexes:
    "polls_c_pkey" PRIMARY KEY, btree (id)
    "polls_c_aRelation_id_b5130a13" btree (a_relation_id)
Foreign-key constraints:
    "polls_c_a_relation_id_1cca00c9_fk_polls_a_id" FOREIGN KEY (a_relation_id) REFERENCES polls_a(id) DEFERRABLE INITIALLY DEFERRED

postgres=# SELECT * FROM polls_a;
 id | a_field 
----+---------
  1 |       1
  2 |       2
(2 rows)

postgres=# SELECT * FROM polls_b;
 id | b_field | a_relation_id 
----+---------+---------------
  1 |       1 |             1
  2 |       3 |             2
(2 rows)

postgres=# SELECT * FROM polls_c;
 id | c_field | a_relation_id 
----+---------+---------------
  1 |       1 |             1
  2 |       4 |             2
(2 rows)

Ideally I would like to expose flexible client-side queries like so for field-to-field filtering. Included how the filters are written in Django's ORM as merely a reference point and the resulting SQL that is generated.


# reverse related field comparison
(
    A.objects
    # 'b' is an automatically resolved field name for the reverse relation.
    # double underscore indicates following a relation
    .filter(a_field=F('b__b_field'))
    .values()
)
# Output: <QuerySet [{'id': 1, 'a_field': 1}]>
# DEBUG django.db.backends 2019-10-08 10:21:49,812 utils execute:110 9581 140006712284992: (0.002) SELECT "polls_a"."id", "polls_a"."a_field" FROM "polls_a" INNER JOIN "polls_b" ON ("polls_a"."id" = "polls_b"."a_relation_id") WHERE "polls_a"."a_field" = ("polls_b"."b_field")  LIMIT 21; args=()

# forward related field comparison
(
    B.objects
    .filter(b_field=F('a_relation__a_field'))
    .values()
)
# <QuerySet [{'id': 1, 'b_field': 1, 'a_relation_id': 1}]>
# DEBUG django.db.backends 2019-10-08 10:22:12,272 utils execute:110 9581 140006712284992: (0.001) SELECT "polls_b"."id", "polls_b"."b_field", "polls_b"."a_relation_id" FROM "polls_b" INNER JOIN "polls_a" ON ("polls_b"."a_relation_id" = "polls_a"."id") WHERE "polls_b"."b_field" = ("polls_a"."a_field")  LIMIT 21; args=()

# compare two reverse related fields using an operator other than equal to
# (
    A.objects
    .filter(b__b_field__lt=F('c__c_field'))
    .values()
)
# <QuerySet [{'id': 2, 'a_field': 2}]>
# DEBUG django.db.backends 2019-10-08 10:22:56,555 utils execute:110 9581 140006712284992: (0.003) SELECT "polls_a"."id", "polls_a"."a_field" FROM "polls_a" INNER JOIN "polls_c" ON ("polls_a"."id" = "polls_c"."a_relation_id") INNER JOIN "polls_b" ON ("polls_a"."id" = "polls_b"."a_relation_id") WHERE "polls_b"."b_field" < ("polls_c"."c_field")  LIMIT 21; args=()
mattbretl commented 5 years ago

Thanks @higherorderfunctor. I believe we could add additional enums to allow those expressions, similar to how the pg-order-by-related plugin adds enum values for related table columns.

BUT.. after diving into this today, I realized that implementing even @seeiuu's simple scenario isn't going to work as planned. I had suggested this approach:

{ filter: { column1: { greaterThanOrEqualToColumn: COLUMN_2} } }

But in reality, the proposed greaterThanOrEqualToColumnfield would exist on a generic type such as IntFilter which has no table context. We can't add context-specific enum values (e.g. COLUMN_2) to those generic types. So I think we'd need to do something like this instead:

{ filter: { column1Ref: { greaterThanOrEqualTo: COLUMN_2} } }

(Ref isn't the best suffix; I have no idea what to call that field.) That would allow us to assign a brand new GraphQLInputObject type to column1Ref (the type name would be something like Table1Column1RefFilter), with operator fields (equalTo, greaterThan, etc.) typed as context-specific enums (COLUMN_2).

While I believe that would work, it would require a significant development effort that I can't commit to right now. If anyone else wants to have at it, I can try to provide further guidance.

jefbarn commented 4 years ago

I had this requirement pop up in a query I was designing today. (need to check event.start_date = event.end_date). Searching around seems like some of the graphql libraries are solving this with directives. See the "Self-referential queries" section here: https://blog.kensho.com/compiled-graphql-as-a-database-query-language-72e106844282

jefbarn commented 4 years ago

FWIW, I got around this problem by using the new makeAddPgTableConditionPlugin to add a new custom filter clause that included my comparison operation. Edit, here's some code for reference:

import { makeAddPgTableConditionPlugin } from 'graphile-utils'
import Maybe from 'graphql/tsutils/Maybe'

export const CustomFilterPlugin = makeAddPgTableConditionPlugin(
  'graphql',
  'ncsr',
  'recurring',
  build => ({
    description: 'Filters the list to NCSRs that use recurring dates.',
    type: build.graphql.GraphQLBoolean
  }),
  (value, helpers, build) => {
    const { sql, sqlTableAlias } = helpers
    const recurrencePatternIdent = sql.identifier('recurrence_pattern')
    const startTimeIdent = sql.identifier('scheduled_start_time')
    const endTimeIdent = sql.identifier('requested_date_of_completion')

    const onceOrSameDay = sql.fragment`
      ${sqlTableAlias}.${recurrencePatternIdent} = 'Once' OR
      date(${sqlTableAlias}.${startTimeIdent}) = date(${sqlTableAlias}.${endTimeIdent})
    `
    if ((value as Maybe<boolean>) === true) {
      return sql.fragment`NOT (${onceOrSameDay})`
    } else if ((value as Maybe<boolean>) === false) {
      return onceOrSameDay
    } else {
      return null
    }
  }
)