Process each incoming document: create sentence vectors indices
Store the indices so that it can be re created if the process dies
For each query: compute vector, find k nearest matches irrespective of any threshold and return the ranked result which is a list of document ids with similarity scores
Fetch the documents from ES and return to DIG UI
If the user chooses a facet, add filter to the list of documents for a query, re rank the results and return to DIG UI. So, if originally we had k documents, adding a facet will always return <= k documents. The facets act as a filter
Pipeline should work as follows:
Process each incoming document: create sentence vectors indices
Store the indices so that it can be re created if the process dies
For each
query
: compute vector, find k nearest matches irrespective of any threshold and return the ranked result which is a list of document ids with similarity scoresFetch the documents from ES and return to DIG UI
If the user chooses a facet, add filter to the list of documents for a query, re rank the results and return to DIG UI. So, if originally we had k documents, adding a facet will always return <= k documents. The facets act as a filter