quickwit-oss / tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
MIT License
11.41k stars 627 forks source link

Fixes bug that causes out-of-order sstable key. #2445

Closed fulmicoton closed 2 weeks ago

fulmicoton commented 3 weeks ago

The previous way to address the problem was to replace \u{0000} with 0 in different places.

This logic had several flaws: Done on the serializer side (like it was for the columnar), there was a collision problem.

If a document in the segment contained a json field with a \0 and antoher doc contained the same json field but 0 then we were sending the same field path twice to the serializer.

Another option would have been to normalizes all values on the writer side.

This PR simplifies the logic and simply ignore json path containing a \0, both in the columnar and the inverted index.

Closes #2442