Consider making the tokenization regex configurable

projectEndings / staticSearch

A codebase to support a pure JSON search engine requiring no backend for any XHTML5 document collection

https://endings.uvic.ca/staticSearch/docs/index.html

Mozilla Public License 2.0

46 stars 21 forks source link

Consider making the tokenization regex configurable #303

Open martindholmes opened 2 months ago

martindholmes commented 2 months ago

Following the work on issue #300, it seems plausible that other situations will arise in which the default tokenization regex is not appropriate for a particular linguistic or historical context. We should consider allowing users to define their tokenization regex in the config file.