koursaros-ai / nboost

NBoost is a scalable, search-api-boosting platform for deploying transformer models to improve the relevance of search results on different platforms (i.e. Elasticsearch)
Apache License 2.0
674 stars 69 forks source link

More documentation please #49

Open jusjosgra opened 4 years ago

jusjosgra commented 4 years ago

This library is excellent and I would really like to use it, so first of all thank you for making it available!

Unfortunately, there isnt enough documentation to make it particularly usable beyond a default deployment. For example it is not clear how I could: 1) send a query directly to es through the nboost proxy without reranking (this is useful for evaluating performance) 2) use my own model; you provide cli access to control from a list of hosted models but I would like to load a custom model

I am digging through the codebase to solve these issues but it would be great if you could document the API as I am sure these things are straight forward.

Thanks again for the great library.

Sharathmk99 commented 4 years ago

@jusjosgra where you able to train custom model and use it with nboost. I'll be interested to know how you did it.

AAbbhishekk commented 4 years ago

@jusjosgra Do you have the required data for benchmarking??

pertschuk commented 4 years ago

@jusjosgra Thank you for your positive feedback - I will try to improve the documentation.

Essentially the way it works is that there are a set of config variables (which can be viewed by running nboost --help. These config variables are set in the following hierarchy: python args = {**cli_args, **json_args, **query_args} e.g. so json args (in the body of the request) overwrite the cli config args, and query_args (set as query params) overwrite both.

As for these two specific questions: 1 ) As per above you can add in the JSON body { 'nboost' : { 'no_rerank': True } } (or add as a query param) 2) You can just link to the relative directory of a Pytorch transformers model (or tensorflow .ckpt model) with --model_dir. If loading a custom model you also need to specify a --model_cls with which to load the custom model which in the case of pytorch transformers would be PtBertRerankModelPlugin.

Nboost expects a binary classifier model trained as 0 = not relevant to search 1 = relevant to search.

jusjosgra commented 4 years ago

Thanks so much for your thorough response!

For no_rerank, it appears that this is specified at proxy deploy time, is there a way to pass it at query time to skip the reranking step dynamically?

pertschuk commented 4 years ago

@jusjosgra all of these flags are both deploy time and dynamic, as you can include it in the json body or query params of the request to override.

For example if request is a POST with the following JSON body:

{
   "query" : "This is a search",
   "nboost" : { 
        "no_rerank" : true
   }
}

or if it is a GET request simply append it to the url http://nboost_host:8000?q=query&no_rerank=True

pidugusundeep commented 4 years ago

Hey @pertschuk, Yes I was able to that but it still returns the same results both for no_rerank=True and without, and I am curious to know whey does it return the nBoost scores if the no rerank is set to true such as below

nboost": {
        "scores":{
               "........................."
        }
}