Index prod data updates to make it faster

o19s / opensearch-hybrid-search-optimization

This repository is meant to optimize hybrid search settings for OpenSearch. It covers a grid search approach to identify a good parameter set and a model-based approach that dynamically identifies good settings for a query.

2 stars 0 forks source link

Index prod data updates to make it faster #6

Closed vishal-git closed 1 week ago

vishal-git commented 1 week ago

In notebooks/1_Prepare_OpenSearch.ipynb a counter increment (for attempts) was missing, which has been added.
In notebooks/2_Index_Product_Data.ipynb replaced the pandas's iterrow() block with to_dict() as iterating through a dict is significantly faster than iterating through a pandas dataframe.
In notebooks/2_Index_Product_Data.ipynb added a function called split_into_batches() to facilitate indexing by batches.