opensearch-project / neural-search

Plugin that adds dense neural retrieval into the OpenSearch ecosytem
Apache License 2.0
64 stars 66 forks source link

[FEATURE] Sentence Highlighter #145

Open asfoorial opened 1 year ago

asfoorial commented 1 year ago

Is your feature request related to a problem?

No

What solution would you like?

I would like to have a highlighter that supports the neural search capability. It should highlight the most relevant sentences in the neural search resulting documents.

What alternatives have you considered?

There are no available alternatives at the moment. So the only choice is to develop one.

Do you have any additional context?

I tried to implement it myself but faced the following challenges:

  1. I had to implement my own neural-search plugin since this one relies on KNNQuery which does store the query text. For example, in the below, fieldContext.context.query() returns an instance of KNNQuery. I suggest that the neural-search plugin has its own NeuralQuery that extends KNNQuery and keeps neural search related attributes such as query text. I hope there are other ways to get the query text at highlight time.

@Override public HighlightField highlight(FieldHighlightContext fieldContext) { System.out.println("Query: "+fieldContext.context.query()); }

  1. The inferenceSentences method is asynchronous notifies an ActionListener after the result is retrieved. If I call it inside the above highlight method then the highlight method will return before the actionlistener is notified and thus won't be able to get the embeddings to compute sentence similarity and get the sentence to highlight. I had to implement my own synchronous inferSentences. Below is a pseudo code of what I am trying to do.

@Override public HighlightField highlight(FieldHighlightContext fieldContext) { System.out.println("highlighting.."); List responses = new ArrayList<>(); String queryText = get query text from fieldContext.context.query()

    List<Float[]> embeddings = new ArrayList<>();

    List<String> sentences= get sentences from search hit
    sentences = query + sentences

    List<List<Float>> vectors = clientAccessor.inferSentences("U3R9CYcBOk2JRjrls0nH", sentences);

    for(List<Float> v:vectors)
        {
            List<Float> s = v;
            embeddings.add(s.stream().toArray(Float[]::new));
        }
        System.out.println("Computing similarity");
        double maxSim = 0;
        String maxSentence = null;
        if(embeddings.size()>0)
        {
            Float[] queryEmbedding = embeddings.get(0);
            for(int i=1;i<embeddings.size();i++)
            {
                float sim = consineSim(queryEmbedding, embeddings.get(i));
                set maxSim and maxSentence
            }
        }
    responses.add(maxSentence);

    return new HighlightField(fieldContext.fieldName, responses.toArray(new Text[] {}));
}

Having said the above, I hope that you tell what is the route to take here. Is this feature going to be available in the plugin any time soon?

Thanks

navneet1v commented 1 year ago

@asfoorial This is an interesting feature, and I remember the same request for the highlight feature at the time the RFC was created for this plugin.

Is this feature going to be available in the plugin any time soon?

Highlight feature was not in our roadmap, as team was busy in making plugin GA, but we would really like this feature to be present in plugin.

@asfoorial on the approaches suggested I need to take a deep-look to see if that is feasible or not. In meantime can you provide the use case which you are trying to solve with Highlight feature.

vamshin commented 1 year ago

Please +1 if you are looking for this feature to help prioritize

dswitzer commented 11 months ago

+1 for highlight over neural searches and hybrid searches.

I think this could be helpful when building RAG-based workflows when you're trying to export portions of larger documents to extract just the portion of the text that's being matched.