Open fulmicoton opened 3 years ago
We can encode up to 3 docs in the postings positions directly, instead of jumping into the posting list.
let postings_start_offset = u64::deserialize(reader)? as usize;
let postings_num_bytes = u32::deserialize(reader)? as usize;
let postings_end_offset = postings_start_offset + postings_num_bytes;
postings_start_offset would be [docid1_u32, docid2_u32] and postings_end_offset docid3_u32
@PSeitz It is not working like that actually.
Terms are stored in blocks. The first term is serialized using the scheme you copy pasted. It is quite wasteful. The other terms are expressed as delta against this first term and bitpacked.
It is fairly common to use a term as a primary id.
When only indexing docids, redirecting to the posting list seems overkill. We could simply store the docid for terms that have doc_freq=1 right after the TermInfoBlock.
The index should end up being a tad smaller, and we would remove one seek.