Closed cash closed 2 years ago
@eugene-yang Here is the updated code that works with this branch:
import copy
import random
import patapsco
config = {
"run": {
"name": "Cash's reranker"
},
"documents": {
"input": {
"format": "json",
"lang": "eng",
"encoding": "utf8",
"path": "samples/data/eng_mini_docs.jsonl",
},
"process": {
"normalize": {
"lowercase": True,
},
"tokenize": "whitespace",
"stem": "porter"
},
"comment": "Mini English dataset",
},
"database": {
"name": "sqlite"
},
"index": {
"name": "lucene"
},
"topics": {
"input": {
"format": "json",
"lang": "eng",
"source": "original",
"encoding": "utf8",
"path": "samples/data/eng_mini_topics.jsonl"
},
"fields": "title"
},
"queries": {
"process": {
"normalize": {
"lowercase": True,
},
"tokenize": "whitespace",
"stem": "porter"
}
},
"retrieve": {
"name": "bm25",
"number": 5
},
"rerank": {
"name": "cash"
},
"score": {
"input": {
"path": "samples/data/eng_mini_qrels"
}
}
}
class CashReranker(patapsco.Reranker):
LOGGER = patapsco.get_logger("cash")
def process(self, results):
self.LOGGER.info("Cash reranker!")
new_results = copy.deepcopy(results.results)
random.shuffle(new_results)
return patapsco.Results(results.query, results.doc_lang, 'CashReranker', new_results)
patapsco.RerankFactory.register('cash', CashReranker)
runner = patapsco.Runner(config)
runner.run()
I'm going to merge this in. We can continue to building out the public API this fall.
Looks like the logger is preventing patapsco from running multiple times in one python session.
In run.py
, the initialization always adds new handlers to the logger.
One StreamHandler will be added to the logger and the second one in the list would become a FileHandler after the first time running it.
@eugene-yang okay, looking into that.
@eugene-yang fixed this is master using #21. Let me know if you run into any other issues when using it as a library
@eugene-yang The first commit lets users define a reranker outside of Patapsco and use it in an experiment:
Logging does not work as expected since the namespace is not under patapsco. I probably need to add a utility method to get a logger from Patapsco.
This also doesn't support customized text processing yet.