opensearch-project / demos

Apache License 2.0
2 stars 13 forks source link

[FEATURE][DocBot] Generate responses with our RAG pipeline #83

Open dtaivpp opened 8 months ago

dtaivpp commented 8 months ago

Is your feature request related to a problem?

At the moment we have several of the pieces of the RAG pipeline built but now we need to pull it all together. We need to pull this all together now.

What solution would you like?

Our DocBot class will call docbot.language_model:generate_response see #80. Generate response will need to query our cohere-index and collect the response to return to the user.

https://opensearch.org/docs/latest/ml-commons-plugin/conversational-search/#using-the-pipeline

Note* here the interaction_size is the number of previous chats to send as context. The context_size is the number of results from our search that we will send through.

The most challenging thing with this PR is we will need to use a neural search in our query section in order to find the most relevant documents. https://opensearch.org/docs/latest/search-plugins/neural-text-search/#step-4-search-the-index-using-neural-search. The model_id that we will need to reference here is the MODEL_ID that is being used by our ingestion pipeline.

LucasWang750 commented 8 months ago

I would like to take this issue

dtaivpp commented 6 months ago

Here is an example of what the generate language pipeline looks like:

GET /docbot/_search
{
  "_source": {
    "exclude": [
      "content_embedding"
    ]
  },
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "content": {
              "query": "How do I enable segment replication"
            }
          }
        },
        {
          "neural": {
            "content_embedding": {
              "query_text": "How do I enable segment replication",
              "model_id": "Z8VpCYwBKF5Jo_eo10QE",
              "k": 5
            }
          }
        }
      ]
    }
  },
  "ext": {
        "generative_qa_parameters": {
          "llm_model": "gpt-3.5-turbo",
            "llm_question": "How do I enable segment replication",
            "conversation_id": "JcVbCYwBKF5Jo_eoe0TD",
                         "context_size": 3,
                         "interaction_size": 3,
                         "timeout": 45
        }
    }
}

We will need to pass in the model ID, conversation ID, and the question. Then when we are processing the answers this is the response ["ext"]["retrieval_augmented_generation"]["answer"]