Index multiple fields. - Githubissues

koursaros-ai / nboost

NBoost is a scalable, search-api-boosting platform for deploying transformer models to improve the relevance of search results on different platforms (i.e. Elasticsearch)

Apache License 2.0

675 stars 69 forks source link

Index multiple fields. #31

Closed tahirahmad2030 closed 4 years ago

tahirahmad2030 commented 4 years ago

I couldn't index multiple fields in the index of Elasticsearch using nboost-index command. My csv file contains 5 columns and I want to index all the field and search on one field. How can I achieve that in NBoost?

pertschuk commented 4 years ago

nboost-index is a helper to index documents with a single text column that can be ranked on with BERT.

So you can either use the --id_col and --body_col switches with nboost-index to specify a single text column to index

or index the whole csv in elasticsearch as you normally would e.g. this link and then specify which field you want to rerank on with nboost using the --cvalues_path and the jsonpath of the choice

colethienes commented 4 years ago

I just added new documentation for the dsl. You can find it here.

aurora1625 commented 4 years ago

@colethienes I also encountered the same problem, could you be more specific on how to index multiple columns? Thanks!

BTW, great work!

colethienes commented 4 years ago

I just updated the nboost-index tool to automatically index multiple columns (using the column headers as the field names. Use --id_col to specify whether the first column is an id. You can also check nboost-index --help for more options.

Let me know if you have any further issues.

aurora1625 commented 4 years ago

It works with 0.2.1. I notice some difference between 0.2.0 and 0.2.1.

Default value of id_col is False.
In cli, there is no col_name setting anymore.

Does this mean, there is no need to set col_name, and nboost will index all the colume in csv file?

Thanks!

colethienes commented 4 years ago

That's correct. No need to set column names except for in the csv. If set to True, --id_col will assume the first column of the csv contains ids (_id for elasticsearch).