readthedocs / readthedocs.org

The source code that powers readthedocs.org
https://readthedocs.org/
MIT License
7.99k stars 3.58k forks source link

Search: allow to boost pages using keywords #8670

Open stsewd opened 2 years ago

stsewd commented 2 years ago

Currently we allow defining the priority on pages https://docs.readthedocs.io/en/stable/config-file/v2.html#search-ranking via a numeric value. But would be also great to define a set of keywords a page will match (that aren't necessarily included in the document).

On ES we can do something like this to support it https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html.

The other thing is defining the source of those keywords, it can be from the html keywords meta-tag or from the config file (like we do with ranking and ignoring).

Sphinx uses meta-tags as source

Also, Sphinx will add the keywords as specified in the meta directive to the search index. Thereby, the lang attribute of the meta element is considered.

ref https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#html-metadata https://github.com/readthedocs/readthedocs.org/issues/7082#issuecomment-658441967

Looks like google and other search engines ignore/penalize using the keywords meta-tag now

Google hasn’t used meta keywords for rankings since 2009

Bing went one step further in 2011 when they announced they use the tag as a spam signal.

https://ahrefs.com/blog/seo-meta-tags/#meta-keywords

So probably better to use the config file? I think it's also more explicit, but I can also see it as duplication.

Related https://github.com/readthedocs/readthedocs.org/issues/7217

stsewd commented 2 years ago

Also, didn't find something like this on algolia (like an explicit configuration)

humitos commented 1 year ago

I'm not convinced this is a good feature and probably it just adds confusion to users instead. I image searching for keyword and clicking on the first result. Then, while reading the page I realize that keyword is not present in the page at all.

I think it makes sense Google's decision to penalize this behavior. In my opinion, it's counter intuitive for users and, in the end, bad UX. I would not implement this feature.