quarkusio / search.quarkus.io

Search backend for Quarkus websites
Apache License 2.0
2 stars 6 forks source link

Sub-optimal search result: "dev services" does not rank high enough the tutorial that talks most about dev services #40

Open holly-cummins opened 1 year ago

holly-cummins commented 1 year ago

If I search on https://quarkus-website-pr-1825-preview.surge.sh/guides/ for 'dev services', in an ideal world the 'my second application' guide would be the first result (IMO). At the very least, I'd hope it was in the results. The title doesn't mention dev services, but the slug and body feature dev services a lot.

image

(This kind of content is an ideal use case for improved search because the title of the tutorial doesn't mention dev services because it's aimed at people who don't know they need to know about dev services .... but a direct search should also find it because it's our main introduction to dev services.)

yrodiere commented 1 year ago

Thanks for the report.

At the very least, I'd hope it was in the results

FWIW "Your second Quarkus application" does appear in the results, but far down the list (you have to scroll and trigger loading of additional results).

in an ideal world the 'my second application' guide would be the first result (IMO)

I may be wrong, but "Dev Services Overview", at least, does seem more relevant than this guide when searching for "dev services"... ? Are we still talking about relevance here, or do you want some kind of "featured" list of results that always appear first?

In any case, I can offer the following immediate solutions to try to improve the relevance score of Your second Quarkus application:

In the longer term, we could consider adding a list of "featured guides" near the top of the search results. It would a short (3-4) list of matching guides that we cherry-picked and tagged through asciidoc metadata because we think are particularly important. This list would be short and compact, so as not to interfere with "main" results, but could be highlighted in other ways (more vivid colors, bold font, colored background, ... don't ask me, my UIs are generally appalling). Think advertisement in web search engines :) If you think this makes sense, I'll create a separate issue.

yrodiere commented 1 year ago

Regarding this:

Customize analyzers so that "dev services" is considered a synonym of "devservices". This is important because we do appear to use "devservices", without a space, in the :topic: metadata.

These filters may be relevant (from most likely to help to least likely):

gsmet commented 1 year ago

Topics are not keywords, they are topics. They are designed to look nice in a tag list or something. We could make it dev-services if you prefer.

gsmet commented 1 year ago

you have to scroll and trigger loading of additional results

Given the number of results is finite, should we always display all results?

holly-cummins commented 1 year ago

FWIW "Your second Quarkus application" does appear in the results, but far down the list (you have to scroll and trigger loading of additional results).

Ah, sorry, yes, I was being lazy!

in an ideal world the 'my second application' guide would be the first result (IMO)

I may be wrong, but "Dev Services Overview", at least, does seem more relevant than this guide when searching for "dev services"... ? Are we still talking about relevance here, or do you want some kind of "featured" list of results that always appear first?

Good questions. I did add 'ideal world' because I think getting to the 'Holly's ideal state' behaviour for this case may be non trivial and involve some icky tradeoffs. I should have said that more clearly! So here, I think part of the problem is that 'Dev Services Overview' is perhaps mis-titled. From the title, you'd think it's a page for people who want to know about dev services, and it's not. Obviously, that's not something we can blame the search engine for. :)

Well, and I'm being slightly unfair - the first few paragraphs are an overview, but then it becomes a reference.

In any case, I can offer the following immediate solutions to try to improve the relevance score of Your second Quarkus application:

* Customize analyzers so that "dev services" is considered a synonym of "devservices". This is important because we do appear to [use "devservices", without a space, in the `:topic:` metadata](https://github.com/quarkusio/quarkus/blob/b865f853b4400fd7ca0cee50aa98483529d5f2aa/docs/src/main/asciidoc/getting-started-dev-services.adoc#L12). CC @gsmet: was this on purpose?

Yes, I think this seems like a very good thing to do - perhaps also devservice? That will help other pages in this area.

* And/or we add a `:keywords:` metadata entry containing "dev services" to [`getting-started-dev-services.adoc`](https://github.com/quarkusio/quarkus/blob/b865f853b4400fd7ca0cee50aa98483529d5f2aa/docs/src/main/asciidoc/getting-started-dev-services.adoc)

This also seems useful, although I assume we don't want to have to replicate the topics in the keywords as a general pattern? (Me being lazy again :) )

In the longer term, we could consider adding a list of "featured guides" near the top of the search results. It would a short (3-4) list of matching guides that we cherry-picked and tagged through asciidoc metadata because we think are particularly important. This list would be short and compact, so as not to interfere with "main" results, but could be highlighted in other ways (more vivid colors, bold font, colored background, ... don't ask me, my UIs are generally appalling). Think advertisement in web search engines :) If you think this makes sense, I'll create a separate issue.

I like this, but I also think it seems like something we should do if we have to, and not before. The ideal search engine would magically rank everything correctly without any manual intervention. I hasten to add I'm not sure I've ever seen such an engine. :)

But I wonder if there are some other heuristics that we might want to apply that would replicate the effect of featuring guides, but without the manual curation, like "tend to rank tutorials above reference guides," or ... [drawing a blank]

Thanks for looking into it!

yrodiere commented 1 year ago

you have to scroll and trigger loading of additional results

Given the number of results is finite, should we always display all results?

"Finite" is still up to ~220 (worst case for a search for a particular version) and counting... and you asked me to include titles and even more info in the JSON. And people are already asking to integrate Quarkiverse in the results.

There's a compromise to be found, sure, but I don't think returning all hits is future-proof.

yrodiere commented 1 year ago

Topics are not keywords, they are topics. They are designed to look nice in a tag list or something.

My point was that we do match against a full-text "topics" field and we do apply a higher boost compared the the content of a guide. So we might want it to... actually match?

We could make it dev-services if you prefer.

That would work, but I'll probably need to work on analyzers anyway, be it just to handle users typing devservices in the search box.

yrodiere commented 1 year ago

Thanks @holly-cummins and @gsmet , then I'll look into improving relevance first, and we'll try to "feature" this guide when we work on page ranks (it won't be clear-cut because the relevance sort is necessarily fuzzy, but that should at least improve things a bit).

gsmet commented 1 year ago

Topics can be used to improve ranking but they are not designed for this purpose. That's what I was saying.

yrodiere commented 12 months ago

Regarding this:

Customize analyzers so that "dev services" is considered a synonym of "devservices". This is important because we do appear to use "devservices", without a space, in the :topic: metadata.

These filters may be relevant (from most likely to help to least likely):

* https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-hyp-decomp-tokenfilter.html

* https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-dict-decomp-tokenfilter.html

* https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-common-grams-tokenfilter.html

* https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-graph-tokenfilter.html (would only work for e.g. `DevServices`, not `devservices`)

I created #59 to address this specifically.

gsmet commented 12 months ago

@yrodiere note that IIRC, I changed things to dev-services in the topics now.