jameslittle230 / stork

🔎 Impossibly fast web search, made for static sites.
https://stork-search.net
Apache License 2.0
2.73k stars 56 forks source link

Question: Exclude HTML tags of CSS selectors from being added to the search index #326

Closed nandac closed 1 year ago

nandac commented 1 year ago

Dear Folks,

Let me take this opportunity to thank the creator and developers working on this project. Stork search is a boon to any static website that needs search functionality.

I would like to exclude parts of my website from being added to the search index by specifying an HTML tag, HTML tag attribute, or CSS selector as a way of fine-tuning the search.

I could not find any examples in the documentation. So I was wondering if the feature has been implemented or if there are any workarounds.

Thanks in advance.

jameslittle230 commented 1 year ago

Hello @nandac! This feature is available, but admittedly not well-documented. You can add the exclude_html_selector field to your config file to globally define a selector to exclude from indexing.

To define exclusions on a file-by-file basis, you can use the exclude_html_selector_override config option in a file object.

Let me know if this helps! Happy to provide examples, etc. if this isn't quite enough.

-James

nandac commented 1 year ago

Thanks, @jameslittle230 that is what I was looking for.

I do have one question though, is it possible to specify multiple selectors to exclude? For example something like:

exclude_html_selector = "figure", "p#author", ...

In addition, I am actually using stork-search with the pelican-search plugin, which unfortunately does not support this setting so that is something I would need to contribute to the plugin I guess.

Tseing commented 1 year ago

I have the same question. Is it possible to support to set multiple selectors to be excluded? I found exclude_html_selector is defined as a string, so it can't be set multiple values at least for now.

nandac commented 1 year ago

@Tseing I was able to set it as a comma-separated string of selectors I believe. Have you tried that out?

Tseing commented 1 year ago

@nandac Cool, it works! I mistakenly thought exclude_html_selector defines HTML tags to be excluded. Thank you for your reply! I am using Pelican as well : )

jameslittle230 commented 1 year ago

Thanks for helping out here! I'm closing this issue and will improve the documentation here.