Closed albertoueda closed 2 months ago
So you are right, in that it doesnt currentl, but it could be extended to do so...
But I think I would ask is there an easier way to implement it for now?
Can you write a function that tokenises queries and several fields and looks up the relevant stats in a Terrier lexicon, i.e. to calculate BM25F manually in python?
I'm afraid I'm not that expert in handling Terrier lexicons. I had another option here, that is indexing the new documents (and their fields) together with the initial documents (actually they are not "new" ones, they are simply processed versions of their indexed ones).
In this direction, is there a way to index new documents with pyterrier after an initial index is built? I've noticed there is incremental indexing in Terrier, but are they possible to PyTerrier indexing with IterDictIndexer
's, for instance?
Should I close this issue?
You can use the +
operator on two Terrier indices and retrieve from the combined "virtual index". See example
https://github.com/terrier-org/pyterrier/blob/master/tests/test_index_op.py#L128
One index could be your original documents, and the new index contain your new documents.
If you are happy @albertoueda perhaps we can close this?
Is it possible to use multiple fields in text.scorer?
Context:
body_attr
to one of the columns, but how should I proceed in the case of multiple fields?If it is not possible today, maybe
body_attr
could accept alist
, and/or be renamed totext_cols
, ortext_attrs
.Also, maybe, if the new document we want to rank has fields corrresponding to metadata/fields available in the background index, they can be matched automatically.