koursaros-ai / nboost

NBoost is a scalable, search-api-boosting platform for deploying transformer models to improve the relevance of search results on different platforms (i.e. Elasticsearch)
Apache License 2.0
674 stars 69 forks source link

nboost Results-Not giving correct number of output passages #44

Open jishapjoseph opened 4 years ago

jishapjoseph commented 4 years ago

Hi,

I tried travel.csv file to produce search results using nboost. The below parameters are given along with other required parameters. But the API returned only 5 passage results. I was expecting 10 as size was given 10. Could you please rectify the same/let me know what would be the issue? (I have followed the same steps as mentioned in the document) 'default_topk': 20, 'topn': 100, 'q': 'passage:How long will it take from airport to hotel', 'size': 10

Thanks&Regards, Jisha Joseph.

jishapjoseph commented 4 years ago

Hi,

It would be great if you could let me know how to get n number of outputs from NBoost. Currently, I get maximum of 5 or less similar results. Could you please let me know if this could be because there are no passages which has a good similarity and the similarity is 0 for others? Please find below, the parameters which was passed. For this also, it gave three passages only. 'uhost': 'localhost', 'uport': 9200, 'query_path': 'url.query.q', 'topk_path': 'url.query.size', 'default_topk': 100, 'topn': 1000, 'choices_path': 'body.hits.hits', 'cvalues_path': '_source.passage', 'q': 'passage:what are the safety requirements for package units', 'size': 10

Thank you in Advance!

AAbbhishekk commented 4 years ago

@jishajoseph Could you please provide me travel.csv file ?? i am working on similar kind of problem. Thanks.

jishapjoseph commented 4 years ago

@AAbbhishekk The travel.csv file is in the below folder in github. nboost/nboost/resources/travel.csv

AAbbhishekk commented 4 years ago

@jishajoseph Thanks!!!!

pertschuk commented 4 years ago

if you run curl localhost:8000/nboost/status you should see some output like this:

{"avg_model_mrr":0.2475732343580853,"avg_num_choices":49.31506849315068,"avg_rerank_time":1.0792898919725527,"avg_response_time":0.6788859660548153,"avg_server_mrr":0.23740026696046881,"avg_topk":10.0}

The avg_num_choices is how many choices nboost gets back to rerank. The avg_topk is how many it returns. You can use the --default_topk and --topn flag to adjust the # you want returned and the number you want reranekd respectively.

jishapjoseph commented 4 years ago

Thank you for the reply! Currently, I am using below --default_topk and --topn as 100. But still it gives, less than 5 results(No other answers are related to the question). Is it because, there are no other good answers are there for the question? 'uhost': 'localhost', 'uport': 9200, 'query_path': 'url.query.q', 'topk_path': 'url.query.size', 'default_topk': 100, 'topn': 100, 'choices_path': 'body.hits.hits', 'cvalues_path': '_source.passage', 'q': question, 'size': 10

pertschuk commented 4 years ago

@jishajoseph curl the nboost status endpoint as I described.

The avg_num_choices from this represents the # returned by ES. So if this is only 5 , then there are no other good answers for question. Otherwise there may be some bug, and if you provide the output of the status endpoint (after attempting to query it) as well as some sample input I can try to fix it.

Make sure nboost is up to date (I just updated it yesterday)