Open lintool opened 4 years ago
sounds good
Perfect!
Reading header...
=== Header ===
version: 1
num_postings_lists: 9
num_doc_records: 3
total_postings_lists: 9
total_docs: 3
total_terms_in_collection: 16
average_doclength: 5.333333
description: Export of toy 3-document collection from Anserini's io.anserini.integration.TrecEndToEndTest test case
Expecting 9 postings lists and 3 doc records in this export.
term: '01', df=1, cf=1 (0, 1)
term: '03', df=1, cf=1 (0, 1)
term: '30', df=1, cf=1 (0, 1)
term: 'content', df=1, cf=1 (0, 1)
term: 'enough', df=1, cf=1 (2, 1)
term: 'head', df=3, cf=3 (0, 1) (1, 1) (1, 1)
term: 'simpl', df=2, cf=2 (1, 1) (1, 1)
term: 'text', df=3, cf=5 (0, 1) (1, 1) (1, 3)
term: 'veri', df=1, cf=1 (1, 1)
0 WSJ_1 6
1 TREC_DOC_1 4
2 DOC222 6
TODO: encode above as a test case.
might be nice to have another file that demonstrates the "Query terms only" case, i.e. num_postings_lists
< total_postings_lists
, and other relevant statistics
@JMMackenzie and @Chriskamphuis have requested a sample export for testing purposes.
I propose exporting the index from this Anserini test case: https://github.com/castorini/anserini/blob/master/src/test/java/io/anserini/integration/TrecEndToEndTest.java
which indexes this 3 document toy collection: https://github.com/castorini/anserini/tree/master/src/test/resources/sample_docs/trec/collection2
sg?