I am using Colbert v2 model to index a bunch of products and their specifications and trying to build a search engine to retrieve products based upon given query. I've found that for approximately 20Million products I've added to the index. The results obtained are not accurate , often times model recognize to identify the 'Brand' of the product and produces products of a completely different brand sometimes the model completely ignores the numeric values mentioned in the query.
Please suggest what model setting while indexing or searching can directly affect the model accuracy in product retrieval.
Or if there is a reranking technique that can help me in producing more relevant results.
If there is any other service that I can put on top of this model to improve the overall search?
What you're up to looks interesting, but just couple of points:
you know that Colbert is a semantic-based search engine. Meaning that eventually, the system will output results that do not exactly match what you are looking for but are close semantically, which lead to the second point:
You are searching for brands or numeric values, it feels to me this is probably not a good use case for colbert or a pure semantic information retrieval system. You'd be better off with a keyword-based search engine in this case as you'd have exhaustive and exact matches. Also, I have no clue on how bert embeddings behaves when it comes to numerical values.
I am using Colbert v2 model to index a bunch of products and their specifications and trying to build a search engine to retrieve products based upon given query. I've found that for approximately 20Million products I've added to the index. The results obtained are not accurate , often times model recognize to identify the 'Brand' of the product and produces products of a completely different brand sometimes the model completely ignores the numeric values mentioned in the query.
Please suggest what model setting while indexing or searching can directly affect the model accuracy in product retrieval. Or if there is a reranking technique that can help me in producing more relevant results.
If there is any other service that I can put on top of this model to improve the overall search?
Thanks : )