feat: extract wordchars from lunr-languages

yeraydiazdiaz / lunr.py

A Python implementation of Lunr.js 🌖

http://lunr.readthedocs.io

MIT License

187 stars 16 forks source link

feat: extract wordchars from lunr-languages #150

Closed dhdaines closed 1 month ago

dhdaines commented 1 month ago

See #149 (doesn't fix the whole thing)

dhdaines commented 1 month ago

Note also that you could also just add {r'\w'} to all_word_characters in the same way as you do for the default pipeline.

dhdaines commented 1 month ago

In actual fact we should add \w to them, because otherwise they will remove numbers at the end of search terms, which is almost certainly not what you want for a lot of applications! But... this is not bug-compatible with lunr-languages, so it might just need a documented workaround.

dhdaines commented 1 month ago

You may not really want to do this, it seems the trimmers in lunr-languages are full of weird junk: https://github.com/MihaiValentin/lunr-languages/issues/66

dhdaines commented 1 month ago

Hmm. It turns out, actually, that lunr-languages code is generated programmatically as well. So it doesn't make a lot of sense to parse it to create these. I'm closing this PR and will come up with a better way to do this.