medic / cht-core

The CHT Core Framework makes it faster to build responsive, offline-first digital health apps that equip health workers to provide better care in their communities. It is a central resource of the Community Health Toolkit.
https://communityhealthtoolkit.org
GNU Affero General Public License v3.0
438 stars 209 forks source link

contact search for "c_community_health_unit township b" yields 105MB db request and wrong results #8765

Open kennsippell opened 9 months ago

kennsippell commented 9 months ago

Describe the bug This search term c_community_health_unit township b results in a very heavy database query and incorrect results

To Reproduce

  1. Login to large production instance as online user
  2. Search for something like "c_community_health_unit township b"
  3. There is a request to couchdb which is 105MB and takes over 3 minutes

Expected behavior This is a very heavy database query. The only result is "Township C" but if you search for just "c_community_health_unit Township" you see six results including 2 different Township B results

Environment

Additional context

dianabarsan commented 9 months ago

This is unfortunately a known effect of our search library.

To do a compound search - a search with more than one term or filter, every individual search query is performed - one per each term/filter - over the whole database, results are sent to the browser (yes, view query results over the whole database), the browser will intersect the results and present first 50 to the user, and discard the rest.

For large databases this is indeed very painful.

As for "Township C" or "Township B", to allow for partial searches we index individual words from contact fields, and your "township" search is hitting "Township B" docs as well. You would see this even in large search engines, however I don't think we have a way of doing an exact search - like Google does when wrapping multiple terms in quotes.

dianabarsan commented 9 months ago

A relevant issue about short search terms: https://github.com/medic/cht-core/issues/7288

dianabarsan commented 9 months ago

A relevant issue that recognizes the same problem and suggests that we should limit the number of filters that can be passed to searches to the server: https://github.com/medic/cht-core/issues/8427

The issue refers to report filters, but contact filters work in exactly the same way - same search library and algorithms are used

garethbowen commented 8 months ago

7288 is now merged to be released in 4.6.0 which should somewhat mitigate this problem.