Open holly-cummins opened 1 year ago
Thanks for the report.
At the very least, I'd hope it was in the results
FWIW "Your second Quarkus application" does appear in the results, but far down the list (you have to scroll and trigger loading of additional results).
in an ideal world the 'my second application' guide would be the first result (IMO)
I may be wrong, but "Dev Services Overview", at least, does seem more relevant than this guide when searching for "dev services"... ? Are we still talking about relevance here, or do you want some kind of "featured" list of results that always appear first?
In any case, I can offer the following immediate solutions to try to improve the relevance score of Your second Quarkus application:
:topic:
metadata. CC @gsmet: was this on purpose?:keywords:
metadata entry containing "dev services" to getting-started-dev-services.adoc
In the longer term, we could consider adding a list of "featured guides" near the top of the search results. It would a short (3-4) list of matching guides that we cherry-picked and tagged through asciidoc metadata because we think are particularly important. This list would be short and compact, so as not to interfere with "main" results, but could be highlighted in other ways (more vivid colors, bold font, colored background, ... don't ask me, my UIs are generally appalling). Think advertisement in web search engines :) If you think this makes sense, I'll create a separate issue.
Regarding this:
Customize analyzers so that "dev services" is considered a synonym of "devservices". This is important because we do appear to use "devservices", without a space, in the
:topic:
metadata.
These filters may be relevant (from most likely to help to least likely):
DevServices
, not devservices
)Topics are not keywords, they are topics. They are designed to look nice in a tag list or something. We could make it dev-services
if you prefer.
you have to scroll and trigger loading of additional results
Given the number of results is finite, should we always display all results?
FWIW "Your second Quarkus application" does appear in the results, but far down the list (you have to scroll and trigger loading of additional results).
Ah, sorry, yes, I was being lazy!
in an ideal world the 'my second application' guide would be the first result (IMO)
I may be wrong, but "Dev Services Overview", at least, does seem more relevant than this guide when searching for "dev services"... ? Are we still talking about relevance here, or do you want some kind of "featured" list of results that always appear first?
Good questions. I did add 'ideal world' because I think getting to the 'Holly's ideal state' behaviour for this case may be non trivial and involve some icky tradeoffs. I should have said that more clearly! So here, I think part of the problem is that 'Dev Services Overview' is perhaps mis-titled. From the title, you'd think it's a page for people who want to know about dev services, and it's not. Obviously, that's not something we can blame the search engine for. :)
Well, and I'm being slightly unfair - the first few paragraphs are an overview, but then it becomes a reference.
In any case, I can offer the following immediate solutions to try to improve the relevance score of Your second Quarkus application:
* Customize analyzers so that "dev services" is considered a synonym of "devservices". This is important because we do appear to [use "devservices", without a space, in the `:topic:` metadata](https://github.com/quarkusio/quarkus/blob/b865f853b4400fd7ca0cee50aa98483529d5f2aa/docs/src/main/asciidoc/getting-started-dev-services.adoc#L12). CC @gsmet: was this on purpose?
Yes, I think this seems like a very good thing to do - perhaps also devservice
? That will help other pages in this area.
* And/or we add a `:keywords:` metadata entry containing "dev services" to [`getting-started-dev-services.adoc`](https://github.com/quarkusio/quarkus/blob/b865f853b4400fd7ca0cee50aa98483529d5f2aa/docs/src/main/asciidoc/getting-started-dev-services.adoc)
This also seems useful, although I assume we don't want to have to replicate the topics in the keywords as a general pattern? (Me being lazy again :) )
In the longer term, we could consider adding a list of "featured guides" near the top of the search results. It would a short (3-4) list of matching guides that we cherry-picked and tagged through asciidoc metadata because we think are particularly important. This list would be short and compact, so as not to interfere with "main" results, but could be highlighted in other ways (more vivid colors, bold font, colored background, ... don't ask me, my UIs are generally appalling). Think advertisement in web search engines :) If you think this makes sense, I'll create a separate issue.
I like this, but I also think it seems like something we should do if we have to, and not before. The ideal search engine would magically rank everything correctly without any manual intervention. I hasten to add I'm not sure I've ever seen such an engine. :)
But I wonder if there are some other heuristics that we might want to apply that would replicate the effect of featuring guides, but without the manual curation, like "tend to rank tutorials above reference guides," or ... [drawing a blank]
Thanks for looking into it!
you have to scroll and trigger loading of additional results
Given the number of results is finite, should we always display all results?
"Finite" is still up to ~220 (worst case for a search for a particular version) and counting... and you asked me to include titles and even more info in the JSON. And people are already asking to integrate Quarkiverse in the results.
There's a compromise to be found, sure, but I don't think returning all hits is future-proof.
Topics are not keywords, they are topics. They are designed to look nice in a tag list or something.
My point was that we do match against a full-text "topics" field and we do apply a higher boost compared the the content of a guide. So we might want it to... actually match?
We could make it
dev-services
if you prefer.
That would work, but I'll probably need to work on analyzers anyway, be it just to handle users typing devservices
in the search box.
Thanks @holly-cummins and @gsmet , then I'll look into improving relevance first, and we'll try to "feature" this guide when we work on page ranks (it won't be clear-cut because the relevance sort is necessarily fuzzy, but that should at least improve things a bit).
Topics can be used to improve ranking but they are not designed for this purpose. That's what I was saying.
Regarding this:
Customize analyzers so that "dev services" is considered a synonym of "devservices". This is important because we do appear to use "devservices", without a space, in the
:topic:
metadata.These filters may be relevant (from most likely to help to least likely):
* https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-hyp-decomp-tokenfilter.html * https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-dict-decomp-tokenfilter.html * https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-common-grams-tokenfilter.html * https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-word-delimiter-graph-tokenfilter.html (would only work for e.g. `DevServices`, not `devservices`)
I created #59 to address this specifically.
@yrodiere note that IIRC, I changed things to dev-services
in the topics now.
If I search on https://quarkus-website-pr-1825-preview.surge.sh/guides/ for 'dev services', in an ideal world the 'my second application' guide would be the first result (IMO). At the very least, I'd hope it was in the results. The title doesn't mention dev services, but the slug and body feature dev services a lot.
(This kind of content is an ideal use case for improved search because the title of the tutorial doesn't mention dev services because it's aimed at people who don't know they need to know about dev services .... but a direct search should also find it because it's our main introduction to dev services.)