Closed cash closed 2 years ago
Looks like if we want to use English/Original queries, patapsco checks whether English is supported with qrel. I don't think it is the intended behavior as the language of the query is not the same as the language with qrel supported.
With a topic like this,
{"topic_id": "5", "languages_with_qrels": ["zho"], "topics": [{"lang": "eng", "source": "original", ...
Currently, patapsco with this setting will not grab topic 5. But this is the essential config for PSQ.
"topics": {
"input": {
"format": "json",
"lang": "eng",
"source": "original",
"encoding": "utf8",
"path": "../data/dev.topics.v1-0.jsonl"
},
"fields": "title"
},
If the topic is changed to this, it will pass
{"topic_id": "5", "languages_with_qrels": ["zho", "eng"], "topics": [{"lang": "eng", "source": "original", ...
Looks like when using a partial run for retrieval, it checks whether the language matches the one in .lang
.
But I don't think the file .lang
is stored when a run is finished.
@eugene-yang I'll check on the qrels. It should depend on the documents and not the topics.
The .lang file is written to the lucene directory.
Oops got it. :)
@eugene-yang I hacked the master branch to turn off the qrel check. This is going to require a larger refactor to fix correctly.
I used the filter_lang to bypass that in the demo notebook but I don't think that is the right way to do things.
I think that is the best way for now, but I broke that when I removed the filtering. Let me know if you need that put back in master.
First breaking commit as we work toward the 1.0.0 release