Closed Krinkle closed 1 year ago
In your search request to Typesense, could try adding sort_by: item_priority:desc
as an additional parameter and see if that helps?
@jasonbosco I am using this:
// https://typesense.org/docs/0.24.1/api/search.html
const resp = await fetch(
`${this.origin}/collections/${this.collection}/documents/search?` + new URLSearchParams({
q: query,
per_page: '5',
// based on https://github.com/typesense/typesense-docsearch.js/blob/3.4.0/packages/docsearch-react/src/DocSearchModal.tsx#L193
query_by: 'hierarchy.lvl0,hierarchy.lvl1,hierarchy.lvl2,hierarchy.lvl3,hierarchy.lvl4,hierarchy.lvl5,hierarchy.lvl6,content',
include_fields: 'hierarchy.lvl0,hierarchy.lvl1,hierarchy.lvl2,hierarchy.lvl3,hierarchy.lvl4,hierarchy.lvl5,hierarchy.lvl6,content,url_without_anchor,url,id',
highlight_full_fields: 'hierarchy.lvl0,hierarchy.lvl1,hierarchy.lvl2,hierarchy.lvl3,hierarchy.lvl4,hierarchy.lvl5,hierarchy.lvl6,content',
// "group_by=url_without_anchor" requires typesense/docsearch-scraper:0.6.0.rc1
group_by: 'url_without_anchor',
group_limit: '1',
sort_by: 'item_priority:desc',
snippet_threshold: '8',
highlight_affix_num_tokens: '12',
'x-typesense-api-key': this.key,
}),
I re-read your original report in detail and just realized what was happening. I've pushed out a potential fix in typesense/docsearch-scraper:0.6.0.rc2
.
Could you give it a shot now?
@jasonbosco Worked perfectly. Thanks again!
Description
When using
typesense/docsearch-scraper
, in combination with Search API requests such as those by typesense-docsearch.js, I found that the lead paragraph was suprisingly downranked, and thus rarely returned as snippet in results.As part of evaluating Typesense for use on https://api.jquery.com and https://qunitjs.com, I found my local branch with Typesense often return suboptimal results compared to the live site (still using Aloglia).
The legacy Algolia scraper gave priority to headings and content earlier on the page using the
position
attribute. Anecdotally, this appears to be useful. I imagine it is common in documentation to use similar phrasing multiple times, and I feel the first mention is likely a more useful starting point to start reading the page.My assumption is that Typesense has not (intentionally) inverted this logic. As such, I'm tentatively reporting this as a bug. I'm happy to hear otherwise!
Steps to reproduce
Two examples:
https://api.qunitjs.com/assert/propEqual/ contains:
Compare an object’s own properties using a strict inequality comparison.
Examples
Compare the values of two objects properties.
https://qunitjs.com/intro/ contains:
Browser support
Firefox: 45+
Integrations
on Headless Firefox and Chrome
Actual Behavior
compare
firefox
Other information
Looking at the implementation, I believe these two chunks of code seem related to this problem.
https://github.com/typesense/typesense-docsearch-scraper/blob/0.6.0.rc1/scraper/src/strategies/algolia_settings.py#L61
It believe
customRanking
is no longer used, but historically this gave earlier positions (ascending) preference when rank/level are equal.https://github.com/typesense/typesense-docsearch-scraper/blob/0.6.0.rc1/scraper/src/typesense_helper.py
I believe
item_priority
is what is used now, however by adding the numbers up this boosts content that is further down the page, and thus effectively downranks the lead paragraph and other content that is earlier on the page.