sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.55k stars 2.12k forks source link

Quicksearch often does not find correct string composed of two words #1486

Open shimizukawa opened 9 years ago

shimizukawa commented 9 years ago

Steps to reproduce:

  1. Go to http://sphinx-doc.org/index.html
  2. Try searching "Output formats", "Hierarchical structure" or "Automatic indices" (this strings of two words are present right there at index page, formatted bold).
  3. Search will not find any of them, only single words, scattered thru the docs.

shimizukawa commented 9 years ago

From Viacheslav Kobylinskyi on 2014-06-11 14:34:28+00:00

Issue version corrected (1.2.2 from 1.2).

shimizukawa commented 9 years ago

From Viacheslav Kobylinskyi on 2014-07-08 12:22:15+00:00

Has anyone else had this issue? Is this a bug, or am i just doing something wrong?

stefanzweig commented 9 years ago

I have this issue for a very long time. From the javascript behind the search it searches the index made by sphinx itself. If there is no entries in the index the search return no results.

I come here from google search this symptom. :) I am seeking a solution, too.

lakshmi-kannan commented 9 years ago

Just to add, hyphentaed words like what-in-the-world aren't indexed either. Search returns nothing for hyphenated searches. I am seeing the same behavior as you for two words.

davidfraser commented 9 years ago

Also, it seems that the Japanese search does not have a js_stemmer defined, and so doesn't do word separation on the submitted searches...

Lingnik commented 7 years ago

@shibukawa @shimizukawa Just curious, did the snowballstemmer work improve this? I am trying to understand stemmer limitations to see if switching to PyStemmer from PorterStemmer will improve our results, or if other improvements are required.

shibukawa commented 7 years ago

snowball stemmer includes two algorithms for English, but Porter stemmer is a default one. Even if you select PyStemmer, internal logic is as same as PorterStemmer for English.

zcorpan commented 5 years ago

Two consecutive words seem to work now. However, searching for words with hyphen seems to still be an issue: https://github.com/web-platform-tests/wpt/issues/18943

I found https://github.com/sphinx-doc/sphinx/pull/2818 (from 2016) which claims

Sample regex above allows users to search for strings containing these punctuation characters: \, /, :, ., and -.

@tk0miya would that PR fix the issue with hyphenated words?