To make the integration of the python reranking and patapsco better, I think we need several things here --
Python API for accessing everything reranking script can have, including post-processed documents via sqlite, inverted index through our interface, topic object, and retrieval scores. They were passing to the reranking shell script by files or arguments but passing through python variables should be much more efficient.
Register the reranking function/class into the pipeline
Run the pipeline in python interactive shell. My vision of this is like (in a jupyter notebook) > (import patapsco) > (define/import your reranking class) > (define a patapsco pipeline object) > (add the reranking class into the object) > (run the pipeline)
Not critical but nice to have: execute partial pipeline or let users have access to the intermediate stuff. So they are able to look into the process for both debugging and learning.
@cash If these look reasonable to you, I can start looking into how to implement them.
To make the integration of the python reranking and patapsco better, I think we need several things here --
@cash If these look reasonable to you, I can start looking into how to implement them.