Closed trinity-1686a closed 10 months ago
Minimal Error Case:
#[test]
fn test_json_date_with_id_regression() {
let mut schema_builder = Schema::builder();
let json = schema_builder.add_json_field("json", TEXT);
let schema = schema_builder.build();
let index = Index::create_in_ram(schema);
let mut writer = index.writer_for_tests().unwrap();
let doc = json!({"field": "a"});
writer.add_document(doc!(json=>doc)).unwrap();
writer.commit().unwrap();
let doc = json!({"field": "a", "id": 1});
writer.add_document(doc!(json=>doc.clone())).unwrap();
writer.commit().unwrap();
// Force Merge
writer.wait_merging_threads().unwrap();
let mut index_writer: IndexWriter = index.writer_for_tests().unwrap();
let segment_ids = index
.searchable_segment_ids()
.expect("Searchable segments failed.");
index_writer.merge(&segment_ids).wait().unwrap();
assert!(index_writer.wait_merging_threads().is_ok());
}
If the Term without freq is ordered before the text field it does not panic
let doc = json!({"field": "a", "aid": 1});
// aid so it gets ordered before
with some printf debugging I found the document causing the crash to be
and more specifically its term
payload.comment.created_at:2022-05-01T00:00:01Z
, which is actually stored as ad
(date == signed integer; you have to flip the 1st bit to decode it):[150, 234, 210, 20, 254, 193, 202, 0]
doc mapper is:
build with quickwit 163ed7ef50051dd5cc1709675421a64102047cda and tantivy bff7c58497964f947dc94e2e45dfe9962e1d10c3
To reproduce: ingest enough of github archive to trigger a merge, the 1st document of type
IssueCommentEvent
will cause a panic (or possibly some other kind of document before)this looks a lot like a variant of https://github.com/quickwit-oss/tantivy/issues/2251, which is supposed to be fixed by https://github.com/quickwit-oss/tantivy/pull/2253