projectEndings / staticSearch

A codebase to support a pure JSON search engine requiring no backend for any XHTML5 document collection
https://endings.uvic.ca/staticSearch/docs/index.html
Mozilla Public License 2.0
51 stars 22 forks source link

Wildcard searches #48

Closed wsalesky closed 4 years ago

wsalesky commented 4 years ago

Hi @martindholmes and @joeytakeda I have been experimenting with staticSearch for the past month or so, and am planning to use it with a new project I am working on. Are there any plans to add wildcard search capabilities to the app? This is one of the features requested by my project. I'm happy to investigate adding this option myself if this is not on your radar. Do you have any tips on where I should get started in the code?

I am also adding diacritic insensitive searches to our version of the app, as the project I am working with will be searching over multiple foreign languages. Is this a feature you would be interested in incorporating in the the codebase?

Thanks for a great project (and the copious comments in the code).

wsalesky commented 4 years ago

I think a generic option would probably be enough for most use cases. My version is quite generic at this point, here is the XSLT code:

<xsl:sequence select="replace(lower-case(replace(normalize-unicode($token, 'NFD'), '[\p{M}]', '')),'ʿ|ʾ','')"/>

This catches most cases in our current data set (tested on French, Greek and Arabic), but I may need to extend it as I find more test cases.

martindholmes commented 4 years ago

We do have an issue for this, actually -- I should have pointed you to it:

https://github.com/projectEndings/staticSearch/issues/60

It's scheduled for release 1.1, because we'd like to get a 1.0 out very soon. I think what's still not clear is whether/how diacritic-free searching would combine with diacritic-based searching; do you use two indexes, or create two tokens for each term in the index?