mathesar-foundation / mathesar

Web application providing an intuitive user experience to databases.
https://mathesar.org/
GNU General Public License v3.0
2.34k stars 323 forks source link

API Integration tests for combinations of filters and groups #798

Open mathemancer opened 2 years ago

mathemancer commented 2 years ago

Problem

Our testing is currently a bit light when it comes to combinations of group counts and filters in the API. This makes it difficult to determine what the desired behavior is, and whether the API conforms to that desired behavior.

Proposed solution

We should add such integration tests. These tests should specify what will happen when both filtering and grouping with different types of filters and groups, and should actually hit the DB (rather than mocking out those calls) so that we can make sure they're producing the expected output.

Marked as draft, since we need to precisely specify what combinations of groups and filters should look like before implementing.

mathemancer commented 2 years ago

We should probably also throw order_by into the mix. I.e., if we're grouping by different columns than we're ordering by, what's the expected behavior? I'm tempted to just disallow that entirely.

kgodey commented 2 years ago

Grouping should take precedence over order by. We should definitely allow that, I use it all the time in other tools (e.g. our GitHub project's Backlog view is grouped by milestone and ordered by status).

pavish commented 2 years ago

if we're grouping by different columns than we're ordering by, what's the expected behavior

From the frontend, we add a sort column for the grouped column before sending the request, which takes precedence over the rest.

mathemancer commented 2 years ago

From the frontend, we add a sort column for the grouped column before sending the request, which takes precedence over the rest.

I've noticed that. Was it because odd things happened when not sorting by the grouped column? I'd like the API to stand on its own, i.e., it needs to handle such things without help from the front end.

pavish commented 2 years ago

I agree that the API needs to handle this on it's own.

I do not remember exactly why I had to add the grouped column as a sort column, but I think it was because the result was not sorted as the frontend expected it to be.

From the UI perspective, order_by should apply only within each group and within each sub-group if the user grouped by multiple columns. Inorder to achieve this, we treat grouping as a form of sorting, where the grouped columns are given highest sort precendence, and then we display group headers at the boundaries. Even though this is a tree structure, we require it to be flattened out for rendering it.

mathemancer commented 2 years ago

I think I'm in agreement with @kgodey and @pavish for group and order combos, and that works for non-ranged grouping, since the group divisions aren't order-dependent (i.e., equality is symmetrical, and so the ordering between group keys just affects order between groups). But there's a subtle detail to consider for ranged grouping. Grouping and ordering are both asymmetrical when we have ranged groups, and the column lists could overlap. Thus, the order affects not only the order between groups, but also the group boundaries. A given row could migrate to a different group if the order changes, even if the columns that define the group don't. In order to decouple the group boundaries from the order_by list, we'd need to order_by twice, once to define groups, and again within groups. This would mean that even keys in the intersection of the group_by and order_by column sets would be out of order across groups, conflicting with the order_by globally. This may be a strange or confusing thing UI-wise; in fact I suspect even this explanation is a bit confusing.

For the examples, assume group_by=(col_1, col_2) and order_by=(col_2, col_1), and that we want to divide into two equal ranged groups.

Example 1

Here, we give the group_by total authority over ordering. Because the column sets overlap, this overrides the order_by. This gives:

 col_1 | col_2
-------+-------
   1   |   8
   2   |   6
   3   |   4
   4   |   2
-------+-------
   5   |   7
   6   |   5
   7   |   3
   8   |   1

Note that things are completely out-of-order w.r.t. the order_by tuple.

Example 2

Next, we use the order_by for ordering, using the group_by only to define which columns are used for grouping. Because the sets overlap, this means the group_by tuple has no effect:

 col_1 | col_2
-------+-------
   8   |   1
   4   |   2
   7   |   3
   3   |   4
-------+-------
   6   |   5
   2   |   6
   5   |   7
   1   |   8

Here, the groups are completely different from what would be specified by the group_by itself.

Example 3

Finally, we'll use the group_by-implied order for group definition, and the order_by for order within groups:

 col_1 | col_2
-------+-------
   4   |   2
   3   |   4
   2   |   6
   1   |   8
-------+-------
   8   |   1
   7   |   3
   6   |   5
   5   |   7

Note that the rows are now out of order from both the group_by definition and also the order_by definition. This cannot be fixed by ordering between the groups:

 col_1 | col_2
-------+-------
   8   |   1
   7   |   3
   6   |   5
   5   |   7
-------+-------
   4   |   2
   3   |   4
   2   |   6
   1   |   8

I find this version most confusing of all, due to the global inconsistency. There isn't a clear reason for the order between the groups, for one thing, and they may not even have what they expect on the first page of results.

Final thought

I think we should try to help the user understand that the group_by definition implies an order for rows, and help guide them to creating a group by statement that has the order they expect.

We could then disallow overlapping column sets between the grouping and ordering column sets, or make it clear that such overlaps may lead to surprising results. Note all the problems go away if the column sets for ordering and grouping do not overlap. I also can't think of a convincing use-case for such an overlap, but maybe I'm not being creative enough.

Also note that things can get even more confusing if there are duplicate values in some columns, or if there are more complex Venn diagrams (i.e., each set could have columns that aren't in the intersection too).

mathemancer commented 2 years ago

I actually think we should allow all three behaviors eventually; it could be possible to configure somehow. The implementation for example (3) would get kind of wonky, but is certainly doable.

kgodey commented 2 years ago

@mathemancer I'm not sure if you have a question or if you're just posting some thoughts. To simplify implementation for the first version, I am okay with disabling order-by for ranged groups but not other groups.

mathemancer commented 2 years ago

I'm not sure if you have a question or if you're just posting some thoughts.

I'm just trying to record some thoughts and details while I'm working through the grouping logic while they're fresh. I'm hoping they'll be useful when we get to implementing these tests, and that the picture is clear enough to let the implementer write tests that really cover intended behavior.

disabling order-by for ranged groups but not other groups

Totally agree that there's no need to disable order-by for other groups.

github-actions[bot] commented 1 year ago

This issue has not been updated in 90 days and is being marked as stale.