sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.5k stars 2.11k forks source link

Case- and space-aware search results would be good for technical projects #11732

Open rptb1 opened 12 months ago

rptb1 commented 12 months ago

Is your feature request related to a problem? Please describe.

Search results start with irrelevant results that could be noticed by matching case, or noticing spacing.

For example:

image

To reproduce, visit https://memory-pool-system--166.org.readthedocs.build/en/166/ and search for "pin" (a technical term used in our system) and note that the first results you get are irrelevant matches for e.g. "MutatorContextCanStepInstruction".

Describe the solution you'd like

It would be useful to have search results respect case for technical projects that include case-sensitive identifiers.

It would be good if non-case-matching results downranked so that they don't intrude on better matches.

For example, when searching for "pin", pages which match "pin" as a whole word (in lower case) should be presented as the best matches. Stemmed matches such as "pins", "pinned", or "unpin" would be nice too.

Pages with the word "pin" matching with differing case should be ranked lower down.

Pages with the string "pin" matching only as a substring (e.g. in "StepIn") should be ranked even lower.

Describe alternatives you've considered

We will investigate the Sphinx search code. We want the offline search ability so we don't want to consider an external search engine, or leak customer searches to e.g. Google. We welcome pointers or advice are welcome on this issue.

Additional context

Found during formal inspection of transition to documentation of the Memory Pool System to Read the Docs. See https://github.com/Ravenbrook/mps/pull/166#pullrequestreview-1688020503 .

Originally raised as https://github.com/readthedocs/sphinx_rtd_theme/issues/1534 but moved to https://github.com/sphinx-doc/sphinx/issues .

picnixz commented 12 months ago

I think we need to improve the way that we search things in general. IIRC, I'm not sure that we can put quotes around search terms in order to force specific strings. So we should probably work towards a better search engine.

By the way, can you check if we have similar issues opened so that we can group them in one?