A web page containing QUnit.test('add', shows up in search result snippets as QUnit . test ( ' add ' , assert. Take note of the unexpected spaces around virtually every symbol. I believe this is most likely a side-effect of the characters in question having <span> in the source code. However, there are no spaces in the source code around (most) of these characters.
I'm evaluating Typesense for use on https://api.jquery.com, https://qunitjs.com and other OpenJS sites. I've used typesense/docsearch-scraper via GitHub Actions, and docsearch is configured with "text": "p,li,tr,pre" among the selectors. The above code is part of a regular paragraph of PRE tag.
For inline elements like <span>, <em>, <code>, <strong> to not result in additional spaces to be injected into the indexed text. It is not uncommon for prose to sometimes emphasize, underline, strike, superscript, or otherwise wrap only part of a word in markup for any reason. It is probably most common in content with syntax-highlighted source code.
Description
A web page containing
QUnit.test('add',
shows up in search result snippets asQUnit . test ( ' add ' , assert
. Take note of the unexpected spaces around virtually every symbol. I believe this is most likely a side-effect of the characters in question having<span>
in the source code. However, there are no spaces in the source code around (most) of these characters.Steps to reproduce
I'm evaluating Typesense for use on https://api.jquery.com, https://qunitjs.com and other OpenJS sites. I've used
typesense/docsearch-scraper
via GitHub Actions, and docsearch is configured with"text": "p,li,tr,pre"
among the selectors. The above code is part of a regular paragraph of PRE tag.source: typense.yaml source: /docsearch.config.json)
Expected Behavior
For inline elements like
<span>
,<em>
,<code>
,<strong>
to not result in additional spaces to be injected into the indexed text. It is not uncommon for prose to sometimes emphasize, underline, strike, superscript, or otherwise wrap only part of a word in markup for any reason. It is probably most common in content with syntax-highlighted source code.Metadata
Typesense Version: 0.24.1
OS: Debian 11 Bullseye