zhan-huang / js-solr-highlighter

A JavaScript library for highlighting text based on the query in the lucene/solr query syntax
MIT License
2 stars 0 forks source link

Extend word regex to support all languages #4

Open Raul52 opened 1 year ago

Raul52 commented 1 year ago

Hello,

I saw that you have the following logic to exclude characters out of words:

const terms = term.split(/\s/).map(t => t.replace(/^[^a-zA-Z0-9]+/, '').replace(/[^a-zA-Z0-9\*]+$/, ''));

This expression will delete greek characters, latin characters, etc. An improvement would be to allow characters from all languages.

What do you say if we change the regex to:

/^[^\*?\p{L}\p{M}\*?]+/gu

This will match all letters and cover the two * wildcard cases. Source: https://dev.to/tillsanders/let-s-stop-using-a-za-z-4a0m

Can we do this?

I do not have the permission to push this to your repository, otherwise I would have coded this myself.

Best!

Raul52 commented 1 year ago

Hello @zhan-huang,

Any news on this one?