learning-unlimited / ESP-Website

A website to help manage the logistics of large, short-term educational programs
82 stars 57 forks source link

User search controller cannot correctly combine filters #956

Open pricem opened 10 years ago

pricem commented 10 years ago

Due to the use of combined Q objects, the UserSearchController (used by the communications panel and user list generator) does not return correct sets of users when multiple filtering criteria are used ("combination list"). This is an architectural issue and it may require some discussion to come up with a good solution.

lenaqr commented 10 years ago

Is this the issue that happens when you query by two different criteria along the same relationship?

I don't think there's a way to fix that without scrapping Q objects and using QuerySets instead.

ghost commented 10 years ago

Just wondering if I can get a specific set of steps to reproduce this. I am not having much luck with the data that I have

pricem commented 10 years ago

Try this (with the MIT database):

Both of these criteria ("confirmed registration" and "attended") are specified in the StudentRegCore program module, and the handler code for this is in UserSearchController at esp/esp/users/controllers/usersearch.py starting at line 197. While I think that boolean manipulation of QuerySets rather than Q objects will solve this particular problem, I am not sure whether that will guarantee correctness of all of the combination queries.

ghost commented 10 years ago

Thanks, I was able to reproduce the issue using these steps

ghost commented 10 years ago

I think that this is a conceptual issue, rather than an architectural issue. Using the provided example, the empty result set is correct since both options concern an 'event'. i.e. `Confirm Pre-Registration' button." and "Students who attended Spark! 2013." can't be true.

The corresponding clauses are included in the resultant SQL:

AND "users_record"."event" = 'reg_confirmed' AND "users_record"."event" = 'attended'

A couple of possible approaches are:

1) Modify the search screen to group related options(negating the overlap). I'm not sure if any other screens use the combination list. So this could possibly be a non-trivial change. 2) Identify overlapping criteria and push them into the or_keys list(this might also be confusing). This would be relatively easy to implement

pricem commented 10 years ago

Well, they both concern an "event" but not necessarily the same event. We want the generated SQL to reflect this using subqueries or multiple queries rather than simply attaching two (mutually exclusive) filters on the users_record table.

ghost commented 10 years ago

This fix is coming along, at least for the specific combination list that I am addressing. Is anyone aware of a different combination list that reveals the same problem?

ghost commented 10 years ago

I have run into an interesting problem in my attempt to generate a subquery using django orm. It would appear that primary method of generating a group by/having clause is via the annotate method of a query set. This works fine except for the fact that it does not seem to be possible to limit the select columns to a single column(using values_list) i.e. django orm imposes the aggregate field(in this case counting the user id) and therefore, the subquery fails because the column count is greater than one. It may be possible to simply build a 'having' clause on the qset.query instance using WhereNode, but I have yet to find useful documentation on how to achieve that, and I assume this is intentional as it is a lower level API class.

pricem commented 10 years ago

Other interesting test cases (besides those involving Records) include:

benjaminjkraft commented 10 years ago

I hadn't been paying attention to this issue, but this sounds similar to some issues I hit when working on class flags. This seems like a case of what's described at https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships. I dealt with this by doing the boolean operations on the querysets, which generated different queries that worked correctly: https://github.com/learning-unlimited/ESP-Website/blob/main/esp/esp/program/modules/handlers/classflagmodule.py#L81. This might hurt query performance -- I seem to recall finding some weirdnesses of how the ORM builds the SQL -- but I'm not sure there's a good alternative for arbitrarily nested queries. (Although, the comm panel doesn't actually build arbitrarily nested queries, so it might actually be the case that all queries it builds can be built by just chaining .filter()s carefully.

pricem commented 10 years ago

Joel, can you explain briefly how to run your tests against an existing database? I think I have django-nose set up and pointing to the production DB, but I have trouble importing the TestUserSearchController. What's the command line you use to run the tests in that class only? Did you have to modify any other code (e.g. import statements)?

The subquery approach for Record based filters works correctly on my dev server where it didn't before 5511b58.

ghost commented 10 years ago

The correct command syntax(to run a single test method) is:

./manage.py test users.controllers.tests.test_usersearch:TestUserSearchController.test_teacher_classroom_tables

ghost commented 10 years ago

I have investigated the issue of the overlapping teacher queries and it seems that the root of the problem is competing criterion that cancel each other out. For example, the query fails because the following expression(from a sample query) can't be true:

 AND "resources_resourcerequest"."res_type_id" = 152
 AND "resources_resourcerequest"."res_type_id" = 150

Unfortunately, the subquery solution does not appear to apply in this case as it did for record event query because each AND expression in the preceding example seems to be part of a larger QuerySet(i.e. coupled) that is generated in teacherclassregmodule.get_resource_pairs and this also includes classsubjectsectionsresourcerequest__desired_value. To complicate matters further, each mutually exclusive set of criteria are coupled to a textual description that implies this relationship.

Additionally, the get_resource_pairs method seems to generate all possible combinations, the majority of which are discarded when the teacher list is processed in usersearch.query_from_postdata. Even if I were to shoehorn in a fix for this specific problem, my concern is that this could negatively impact other areas of this feature.

In any, it does not appear that the overlapping criteria for this specific teacher search can be resolved without a major overhaul, which in my opinion could also involve simplifying the feature, consolidating search rules and decoupling presentation elements such as labels etc from the process of query generation.