Open xhluca opened 2 years ago
Hi @xhluca, Thanks for the suggestion. I guess here one reason we keep the encoding process separately is to keep it flexible wrt tasks (e.g. NQ/MSMARCO) and GPU/RAM resources. I agree that the evaluation process of dpr can be simpler, maybe we can have a simpler dpr evaluation in pyserini. I'll take a look.
Xueguang
Right now, it's possible to train DPR in a single command, via the
tevatron.driver.train
module. However, to evaluate, a more complex series of command (involving lower-level for loops) is needed, e.g. for DPR on NQ:I think it would be nicer if all this could be reduce to 1 or 2 commands:
Note the usage of
tevatron.driver.evaluate
in order to keepdriver.encode
at a lower level and backward compatible, whileevaluate
would be for higher-level usage like reproducing results. Moreover,tevatron.driver.evaluate
could throw an error if pyserini is not available, e.g.: