jameslittle230 / stork

🔎 Impossibly fast web search, made for static sites.
https://stork-search.net
Apache License 2.0
2.73k stars 56 forks source link

Odd `<mark>...</mark>` ranges #292

Closed kdheepak closed 2 years ago

kdheepak commented 2 years ago

I'm using the latest stork

$ stork --version
Stork 1.4.2

This is an example of the snippet in the search result when the search query is "neovim":

<p>
        ...to use this you will need the following: <mark class="stork-highlight">neovim</mark> v0.5.0 <mark class="stork-highlight">`neovim/nvim-ls</mark>p` The <mark class="stork-highlight">neovim/nvim-lsp</mark> repository contains language server configurations for a...
        </p>

You can see that when backticks are involved, the mark seems to exclude the last letter of the search result:

Screen Shot 2022-05-01 at 11 30 22 PM

The end result is some rather old highlighting. This is what it looks like when I'm not showing the console highlighting:

Screen Shot 2022-05-01 at 11 30 56 PM

For reproducing this, you can find a link to the .st file here:

https://github.com/kdheepak/blog/blob/9e8c45bc90395d05cc610c82f44d6926d4eda7f4/static/assets/stork/search.st

You can also download it here:

https://blog.kdheepak.com/assets/stork/search.st

The code I'm using is all standard, based on the documentation: https://github.com/kdheepak/blog/blob/9e8c45bc90395d05cc610c82f44d6926d4eda7f4/src/lib/components/Search.svelte

And I'm loading stork from the CDN as shown in the documentation:

https://github.com/kdheepak/blog/blob/9e8c45bc90395d05cc610c82f44d6926d4eda7f4/src/app.html#L7

You should be able to see a live version of this bug in action here: https://blog.kdheepak.com/. Scroll to the bottom and you'll be able to find the search bar.

jameslittle230 commented 2 years ago

Thanks for the detailed report! I'll look into this and see if I can make improvements here. As far as I know, this bug is coming into play because there's a hyphen in the word being searched, and some parts of Stork treat a hyphen as a word separator (like a space) whereas others do not.

kdheepak commented 2 years ago

It's possible it is related to hypens, but it also occurs elsewhere. For example, if I search for vimrc:

Screen Shot 2022-05-04 at 7 06 56 PM

It always occurs for text in <mark></mark> tags.

jameslittle230 commented 2 years ago

Ah - might be more than just hyphens, might be all punctuation.

I'll take a look.

kdheepak commented 2 years ago

Thanks! Great job on stork btw!

kdheepak commented 2 years ago

I'm not sure if this was fixed and just hasn't been released yet. But I figured I'd report since it was related to this issue. When I go to https://blog.kdheepak.com and search for julia I get this:

Screen Shot 2022-05-20 at 10 10 54 AM

I think it has to do with unicode lengths. That particular blog post (https://blog.kdheepak.com/my-unicode-cheat-sheet) has a number of very interesting edge cases, and if you aren't using https://github.com/unicode-rs/unicode-segmentation to calculate length of the strings / slices, you might not be covering those corner cases. I see unicode-segmentation in the dependencies but I didn't go through the code to figure out what might be the problem.

jameslittle230 commented 2 years ago

This has been fixed in #297! It'll be released in version 1.5.0, which will be released as soon as I get a moment where I have time to do so!

kdheepak commented 2 years ago

Awesome! No hurry on my end!