Closed cmacdonald closed 7 months ago
Sounds great! It's elsewhere, too, e.g., FlexIndex.get_corpus_iter()
in pyterrier_dr.
So option 3?
I'd like the default to be to return everything that's available. If there's a case where that isn't needed, we can add options later.
pyterrier_pisa has a get_corpus_iter(), courtesy of @seanmacavaney
It would be useful to have such a function for a Terrier index.
A few options that I can foresee:
Terrier's Index class can expose a single get_corpus_iter() method that has pretokenized option. If pretokenized is true a version of the direct index. If pretokenized is false, its a version of the meta index.
Expose these in the relevant classes: a. direct index exposes a get_corpus_iter() which is pretokenized b. meta index exposes a get_corpus_iter() which is metadata
Both!