woylie / flop

Filtering, ordering and pagination for Ecto
MIT License
672 stars 36 forks source link

Support for joins #99

Closed jesse-iluminai closed 3 years ago

jesse-iluminai commented 3 years ago

Is your feature request related to a problem? Please describe. I have an HTML table that displays information from multiple db tables. When using the sort functionality from FlopPhoenix I'd like to specify a column to sort on that is not part of the main table.

Describe the solution you'd like I believe there were plans for this in #48 and we've raised it in other tickets since then for the need to be able to have Flop be aware of other columns from passed in schemas/models.

Describe alternatives you've considered Nothing.

Additional context Let me know what you think @woylie. Looks like this has been on your radar for some time now. Wondering if you had any implementation ideas since you first created the original ticket here and the time you've spent with FlopPhoenix since then? Maybe there is some support already there that I can take advantage of?

woylie commented 3 years ago

I'm still not sure what a good API for this would look like. There are two things to keep in mind:

  1. The parameters you pass via the query parameters in the URL or via an HTML form should not be concerned with the database layout.
  2. Flop should be a thin layer on top of Ecto, and not a replacement for the Ecto.Query API.

In #87, I suggested this format to define compound fields:

@derive {Flop.Schema, compound_fields: [name: [:family_name, :given_name]}

Maybe something similar can be done for joins. Let's say we have an Owner schema and a Pet schema:

schema "owners" do
  field :name, :string
  field :email, :string

  has_many :pets, Pet
end

schema "pets" do
  field :name, :string
  field :age, :integer
  field :species, :string

  belongs_to :owner, Owner
end

And now we want to find all owners that have pets of the species "E. africanus".

In a similar manner as the compound fields option above, we could define join fields for the Owner schema with Flop.Schema:

@derive {Flop.Schema, join_fields: [pets: [:species]}

In this case, :pets would be a named binding, which would already have to be added to the query passed to the Flop functions; and :species would be the field on the joined table referenced by the named binding. Then you would be able to do something like this:

params = %{filters: [%{field: :species, op: :==, value: "E. africanus"}]}

Owner
|> join(:left, [o], p in assoc(o, :pets), as: :pets)
|> Flop.validate_and_run!(params, for: Owner)

However, this configuration would lead to problems as soon as the tables you want to join share field names:

# now we have two filter fields called `:age`
@derive {Flop.Schema, filterable: [:age], join_fields: [pets: [:age]}

schema "owners" do
  field :name, :string
  field :age, :integer
  field :email, :string

  has_many :pets, Pet
end

schema "pets" do
  field :name, :string
  field :age, :integer
  field :species, :string

  belongs_to :owner, Owner
end

So we definitely need to be able to give a name to each field from an association.

@derive {
  Flop.Schema,
  filterable: [:age, :pet_age],
  sortable: [:age, :pet_age, :pet_species],
  join_fields: [pet_age: {:pets, :age}, pet_species: {:pets, :species}]
}

params = %{filters: [%{field: :pet_age, op: :<=, value: 5}], order_by: [:pet_species]]}

So this format would be join_fields: [<filter_field_name>: {<named_binding>, <field>}].

Does any of this make sense to you? Would that solve your problem?

jesse-iluminai commented 3 years ago

What about if there was a way to not worry about defining the joins specifically and focus on the columns that are available in the query?

My concern here is that if we go the explicit route with defining joins we may box ourselves out of being able to use more vanilla ways of accessing data or having to always specify associations in models. I don't define them this way in favor of more explicit queries as the platforms get more complicated and control over the generated SQL is important (consider soft deletes, status flags, and other control fields which are always being included in queries).

If we're providing a query that joins across multiple tables, and if we want to access a column on a foreign table, should we not somehow just be able to access the column name? In those events where there are two "id" fields or two "name" fields, those would normally blow up my queries regardless and we should always try to be selecting the minimum amount of columns needed for our use case (while we're often taught early on that "select * from foo" is good enough). Maybe there's something here with virtual fields that we could lean on where the column name from another table is filling that value?

Something that comes to mind as well is looking back on my experience with Ransack (https://github.com/activerecord-hackery/ransack) for Ruby/Rails it was great to be able to define any number of "racksacker"s as they called them which could be used in a number of different ways such as aliasing, casting, accessing data through json, or even through joins sometimes, etc.

This is certainly deviating from our Joins chat, I think there could be a home for something like this in Flop and may even get the juices flowing for how to work with joins more easily?

  1. Casting data from one type to another more appropriate for sorting and searching

Consider wanting to search/filter on a UUID (or another custom type) for customer service portal where a user types a value or partial UUID into the search box. We need to cast it to text type first to be used in an ilike case. The new id field here on the ransacker is what we'd then use with "filter" or "sort" (sort: [:id, :name]). Arel is what ActiveRecord uses under the hood for query building, this could in turn be something like "fragment" for Ecto.

ransacker :id do
  Arel.sql('lower(people.id::text)')
end

https://github.com/activerecord-hackery/ransack/wiki/Using-Ransackers#4-convert-an-integer-database-field-to-a-string-in-order-to-be-able-to-use-a-cont-predicate-instead-of-the-usual-eq-which-works-out-of-the-box-with-integers-to-find-all-records-where-an-integer-field-id-in-this-example-contains-an-input-string

  1. Using with compound fields

Most of us with our users or profiles tables would have first_name and last_name columns. Some of us may also choose to concatenate that in the query itself rather than unnecessarily running through a model function full_name(first_name, last_name). When we present this in a list we're typically just showing the Name column.

If I wanted to allow a user to search on a persons name, I allow then to type "Joh" (as in John) or "Smi" (as in Smith) or even "John sm" if they want to more easily jump to record they're familiar with. The compound field here as a ransacker might look like the following.

ransacker :full_name do
  Arel.sql('people.first_name || people.first_name')
end

https://github.com/activerecord-hackery/ransack/wiki/Using-Ransackers#5-search-on-a-concatenated-full-name-from-first_name-and-last_name-several-examples

Similarly if we had a join where first_name was on one table and the last name was on another this greatly simplifies things where full_name is used with filtering or sorting conditions.

Additionally, we can more easily expose JSON values out

ransacker :name do
  Arel.sql("lower(buildings.name_translations ->> 'english')")
end

https://github.com/activerecord-hackery/ransack/wiki/Using-Ransackers#10-search-for-a-translated-value-in-a-jsonb-column

Given queries and that they can be complex and span many tables I think it's important to stay flexible here and not bring in the table name into the params, but instead try to work out a system of aliases like the ransackers above. I believe this would allow Flop to become that much more useable when people need to break out of the box and not have to deal with specifying exact join_fields (or compound_fields from the earlier example) all the time. It would keep the definitions under the @derive simpler, albeit at the expense of setting up the aliases, but we'd have more control over what we could do with things like fragements and having more access to the underlying power of SQL. We can also work with custom schemas that cross one or more tables (ie not mapping to just one table specifically).

Sorry here for not providing anything concrete which you may use and I'm certainly not trying to hold Flop/FlopPhoenix up to Ransack, Flop has been great so far although I know with our build out I will need support for more fantastic queries and sorting/filtering cases.

woylie commented 3 years ago

What about if there was a way to not worry about defining the joins specifically and focus on the columns that are available in the query?

If we're providing a query that joins across multiple tables, and if we want to access a column on a foreign table, should we not somehow just be able to access the column name?

I'm not sure what you mean here. When you add a join query in Ecto, the only way to access the fields from the joined table is by using a positional or named binding. If you don't pass the binding to Flop, it wouldn't know how to select fields from any joined tables.

Flop has a very specific purpose: Taking parameters for filtering, ordering and pagination from a user and applying them to a query. It is not meant to be an alternative query DSL.

To handle parameters from outside in a safe way, you will always have to define which fields are allowed in the query. Otherwise you open your application up to user-generated queries on fields that should not be exposed in any way. This is where the sortable and filterable options come in. There is no way around explicitly adding compound and join fields to those lists.

Now you probably want to give a field from a joined table an alias, instead of having something like pet.species in the query parameters. This also allows to refer to those fields in other parts of the configuration.

So in summary:

  1. Flop needs to know the binding in order to get the value from the right table.
  2. Flop needs to know which operations are allowed on which fields, including join and compound fields.
  3. We need a way to define aliases for join and compound fields.

And that's exactly what the proposed solution does.

  1. Casting data from one type to another more appropriate for sorting and searching

Consider wanting to search/filter on a UUID (or another custom type) for customer service portal where a user types a value or partial UUID into the search box. We need to cast it to text type first to be used in an ilike case. The new id field here on the ransacker is what we'd then use with "filter" or "sort" (sort: [:id, :name]). Arel is what ActiveRecord uses under the hood for query building, this could in turn be something like "fragment" for Ecto.

I thought about allowing to use something like dynamic or fragment when defining custom fields. But not in the first iteration. Ecto doesn't have a way to define select aliases that can be used in WHERE clauses, though, so if you also want to select those values instead of just applying filter/sorting parameters on them, you'll have to add them in the select function again. Maybe it would be possible to write a wrapper around Ecto.Query.select/3 or a function/macro to be used within the select function to reuse those definitions. Not sure yet how that would look like.

Given queries and that they can be complex and span many tables I think it's important to stay flexible here and not bring in the table name into the params, but instead try to work out a system of aliases like the ransackers above. I believe this would allow Flop to become that much more useable when people need to break out of the box and not have to deal with specifying exact join_fields (or compound_fields from the earlier example) all the time.

That sounds contradictory. My proposed solution lets you define aliases for compound and join fields in the schema, and you can use those aliases in the query parameters.

jesse-iluminai commented 3 years ago

All sounds reasonable Mattias.

Thanks for the additional insight here and I'm happy to try out your proposed solution when you have some time to take a poke at it. I think it should solve what I need (sorts on columns from other tables). I believe I would need to set up the associations on the models for the awareness then, is that correct?

Great also to hear that you had been considering the other advanced cases.

Sorry for not being clearer on some points, I think it's the 1 year old here with me :)

On Sun., May 16, 2021, 4:12 p.m. Mathias Polligkeit, < @.***> wrote:

What about if there was a way to not worry about defining the joins specifically and focus on the columns that are available in the query?

If we're providing a query that joins across multiple tables, and if we want to access a column on a foreign table, should we not somehow just be able to access the column name?

I'm not sure what you mean here. When you add a join query in Ecto, the only way to access the fields from the joined table is by using a positional or named binding. If you don't pass the binding to Flop, it wouldn't know how to select fields from any joined tables.

Flop has a very specific purpose: Taking parameters for filtering, ordering and pagination from a user and applying them to a query. It is not meant to be an alternative query DSL.

To handle parameters from outside in a safe way, you will always have to define which fields are allowed in the query. Otherwise you open your application up to user-generated queries on fields that should not be exposed in any way. This is where the sortable and filterable options come in. There is no way around explicitly adding compound and join fields to those lists.

Now you probably want to give a field from a joined table an alias, instead of having something like pet.species in the query parameters. This also allows to refer to those fields in other parts of the configuration.

So in summary:

  1. Flop needs to know the binding in order to get the value from the right table.
  2. Flop needs to know which operations are allowed on which fields, including join and compound fields.
  3. We need a way to define aliases for join and compound fields.

And that's exactly what the proposed solution does.

  1. Casting data from one type to another more appropriate for sorting and searching

Consider wanting to search/filter on a UUID (or another custom type) for customer service portal where a user types a value or partial UUID into the search box. We need to cast it to text type first to be used in an ilike case. The new id field here on the ransacker is what we'd then use with "filter" or "sort" (sort: [:id, :name]). Arel is what ActiveRecord uses under the hood for query building, this could in turn be something like "fragment" for Ecto.

I thought about allowing to use something like dynamic or fragment when defining custom fields. But not in the first iteration. Ecto doesn't have a way to define select aliases that can be used in WHERE clauses, though, so if you also want to select those values instead of just applying filter/sorting parameters on them, you'll have to add them in the select function again. Maybe it would be possible to write a wrapper around Ecto.Query.select/3 or a function/macro to be used within the select function to reuse those definitions. Not sure yet how that would look like.

Given queries and that they can be complex and span many tables I think it's important to stay flexible here and not bring in the table name into the params, but instead try to work out a system of aliases like the ransackers above. I believe this would allow Flop to become that much more useable when people need to break out of the box and not have to deal with specifying exact join_fields (or compound_fields from the earlier example) all the time.

That sounds contradictory. My proposed solution lets you define aliases for compound and join fields in the schema, and you can use those aliases in the query parameters.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/woylie/flop/issues/99#issuecomment-841891708, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASVQDGFA4OIXNHM5NPVHXFLTOBGMNANCNFSM445CGGBQ .

woylie commented 3 years ago

Yes, you would have to add @derive {Flop.Schema, opts} to your Ecto schema module and set the for option when you call the validate_and_run function as described above.

woylie commented 3 years ago

I made the necessary updates to Flop.Schema already, you can read the documentation for more details: https://github.com/woylie/flop/pull/100/files#diff-16e382798af55c6d398279cfb3743c1d49b76d3ecb4835aa3e8733d34dc0371aR98. Let me know if something is unclear.

woylie commented 3 years ago