Closed cormac-obrien closed 3 years ago
I dont see any problem in the example above. Can you be extra specific on what you would have expected?
Ah, I realized I was thinking of the positions in the text after the spaces had been removed! There's no actual issue, closing.
Describe the bug The docs for
Token
specify that the offsets range should be half-open:However, the
SimpleTokenizer
generates closed offset intervals:Which version of tantivy are you using? master (6d4b982)
To Reproduce
The above
SimpleTokenizer
output is taken fromcargo run --bin pre_tokenized_text
.