Open artt opened 3 years ago
Shouldn't be too hard to exclude punctuation from the analysis..?
Ah I'm using this for a search engine which would highlight matched words but it doesn't allow infix search. So I'm segmenting the queried words along with the indexed data. Removing punctuation marks would result in an altered version of the match.
For example, searching for อ่อนแอ should ideally return
ป่วยหรือ<mark>อ่อนแอ</mark>?
Doing what you suggested would return
ป่วยหรือ<mark>อ่อนแอ</mark>
I realize that this is a very specific use case, but just wanna note the discrepancy.
It is obviously a bug, that the result would be different. I was just thinking it might not be too hard to fix for @veer66..!
It seems like the library has issues dealing with punctuation marks. For example,
But...