use as a matchmaker as a library

cmacdonald commented 4 years ago

Hi,

I'm interested in using matchmaker as a library. Some related questions/comments:

Being able to pip install and import would help - could you make a setup.py and init.py, then I could use pip "install git+" ?
What is the minimum data I need to supply. Text of the query, text of the documents, + labels. What is the format of idf_embedder?
Could the training loop be separated into a method that can be easily called with the above?

Craig

sebastian-hofstaetter commented 4 years ago

Hi Craig,

I'll look into that, I am currently doing a bit of cleanup (as the current repo version is based on allennlp 0.9 and the new 1.0 has a couple of breaking changes) and then I can also add the init and setup.
I would say that is one of the drawbacks currently, that for the library to work we need to split training and validation data for faster pre-processing in separate python processes, but in theory that could also be done in the train-loop process to allow for the "method-call" format.
Yes, I think I could do that. Currently the train.py is so large to accommodate a large range of configurations, that are probably not needed for most use-cases. Could you elaborate a bit more on how you would want to use the library? For training, just for inference, or both? Thank you.

Best, Sebastian

cmacdonald commented 4 years ago

Here was how it was done elsewhere: https://github.com/Georgetown-IR-Lab/cedr/pull/27/files

2 & 3: I'm on a crusade against a proliferation of commandline interfaces to deep neural toolkits. I'm trying to make everything work in a Python API where:

for training you pass it two dataframe of input queries & documents, one for train and one for validation.
for testing, you pass it a single dataframe of the same format

See example usages at https://github.com/cmacdonald/pyterrier_bert

I understand you might need (I)DF values? perhaps if a simple API can provide these?

Craig

sebastian-hofstaetter / matchmaker

use as a matchmaker as a library #6