renovatebot / renovatebot.github.io

Auto-generating docs repository for Renovate Bot
https://docs.renovatebot.com
44 stars 37 forks source link

Weird search result order #337

Open HonkingGoose opened 11 months ago

HonkingGoose commented 11 months ago

What browser are you using?

Firefox

Other browser name

No response

Describe the bug

When searching on the docs site, the precise match for the dependencyDashboard config option sorts behind things like dependencyDashboardTitle. Usually a precise match sorts higher than partial matches. 🙃

Steps to reproduce

  1. Go to Renovate's docs site.
  2. Search for dependencyDashboard.
  3. The precise match for dependencyDashboard is not the first result.
  4. I would expect the precise match dependencyDashboard to sort higher.

dependencyDashboard-search-query

Additional context

Is our separator tokenization causing problems?

We changed Material for MkDocs's default search behavior (tokenization). Maybe that's related? Here's the relevant snippet from our mkdocs.yml config file:

plugins:
  - search:
      separator: '[\s\-,:!?=\[\]()<>{}"/\\]+|\.(?!\d)|&[lg]t;'

Related PRs for the separator thing:

@TWiStErRob you helped a lot before, do you want to brainstorm again? 😄

Material for MkDocs search boost feature?

Material for MkDocs has a "search boost" feature [^1], but that applies to the whole page, not just a config option. They recommend starting with a low positive value first. For example:

---
search:
  boost: 2 

---

# Page title
...

Boosting the "config options docs page" probably causes other sorting issues... But I wanted to mention boosting, in case it inspires any ideas. 😄

Material for MkDocs improved search in future

The Material for MkDocs maintainer is working on better search. Right now Material uses the Lunr.js search engine. The maintainer is going to replace Lunr.js with something that's better for searching through a docs site. [^2]

[^1]: Material for MkDocs, search boost [^2]: Material for MkDocs repo, maintainer is going to improve search

TWiStErRob commented 11 months ago

I noticed this as well a few weeks ago, just didn't fully realise it. The separator splitting is unlikely to help, because the tokens are full words as far as I know (we removed "case change" separator).

Boost is on the right track, but this might be straight up a lunr ranking issue. It would be interesting to reproduce on a smaller example with mkdocs first, and then try to strip out mkdocs around it to see if lunr ranks things correctly with direct usage.

TWiStErRob commented 11 months ago

@squidfunk can you please have quick look at OP, what do you think of this issue?

squidfunk commented 11 months ago

We're in the midst of reworking the search. Yes, Lunr.js search result order sometimes feels indeterministic, because BM25 scoring is far from ideal for typeahead. We're going to throw out Lunr.js soon. Search is currently the huge topic I'm working on. That being said, you can try to tweak it with the separator and boost settings in the meantime.

Additionally, I would kindly ask you to not mention me for such things. Next time, please create a discussion or an issue in squidfunk/mkdocs-material. You can probably imagine that I get mentioned a lot and I have to budget time for communication. If you use our discussion and issue boards, other users might help you as well, which gives me more time to work on new things, including the issue in the OP. Thank you!

TWiStErRob commented 11 months ago

Sure, thanks for the heads up! And the detailed and fast response!

HonkingGoose commented 10 months ago

Thank you @squidfunk for the response and extra information! ❤️

I don't want to "mess around" with tokenization of the search again. Search seems to work OK in general, except that you can't narrow the search results by giving more keywords. Touching the tokenization also runs the risk of breaking other parts of the search. 🙈

I'll keep this issue open, so any potential bug reporters see it. For now, the easiest thing for us is to wait for Material for MkDocs's better search.

Edit: we'll probably want to follow this upstream issue:

squidfunk commented 10 months ago

🙋‍♀️ Please see https://github.com/squidfunk/mkdocs-material/pull/6321 – feedback wanted!