johnjung / bmrcportal

GNU General Public License v3.0
1 stars 1 forks source link

question on search spec #132

Closed MomoMoses closed 2 years ago

MomoMoses commented 2 years ago

question: if multiple words are entered in the search box (no quotes), how is relevance parsed?

we worked on tuning this up, but I'd like to be able to concisely describe what MarkLogic is doing here on the backend. I know that it is NOT taking each word entered as a separate search and putting all those results up on a SERP.

alternatively, if the site user enters quotes around two or more words, how is that parsed/queried?

MomoMoses commented 2 years ago

from John: Searching uses MarkLogic’s cts:word-query() function- you can see some documentation about it at https://docs.marklogic.com/cts:word-query.

We use the following options: case-insensitive, diacritic-insensitive, punctuation-insensitive, and whitespace-insensitive. For a complete list of options, see the documentation above.

If a query is enclosed in double-quotes, we submit the search with a distance-weight of 64. This means that, rather than just doing exact-phrase searching, MarkLogic will prioritize exact phrases, but it will still return search results in cases where each term in a multiple-term query was present.

Otherwise if the query wasn’t enclosed in double quotes, the system basically tokenizes the search string on whitespace, executes a search on each token individually, and then ANDs the results together.