Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
As part of the Sherlock work, an example showing how to use Morpheus to execute multiple LLM queries that utilize RAG inside of a pipeline.
Describe your ideal solution
Purpose
The purpose of this example is to illustrate how a user could build a pipeline while will integrate an LLM service into a Morpheus pipeline. This example builds on the previous example, #1305, by adding the ability to augment LLM queries with context information from a knowledge base. Appending this context helps improve the responses from the LLM by providing additional background contextual and factual information which the LLM can pull from for its response.
In order for this pipeline to function correctly, a Vector Database must already have been populated with information that can be retrieved. An example of populating a database is illustrated in #1298. This example assumes that pipeline has already been run to completion.
Scenario
This example will show two different implementations of a RAG pipeline but the pipeline and components could be used in many scenarios with different requirements. At a high level, the following illustrates different customization points for this pipeline and the specific choices made for this example:
LLM Service
This pipeline could support any type of LLM service which is compatible with our LLMService interface.
Such services include OpenAI, NeMo, or even running locally using llama-cpp-python
For this example, we will focus on using OpenAI as the LLM service. This service has been chosen to be different from the previous example which uses NeMo. Additionally, the models in OpenAI have been trained to provide responses when given additional context information that RAG provides.
Embedding model
This pipeline can support any type of embedding model that can convert text into a vector of floats.
For the example we will use all-MiniLM-L6-v2 since it is a small, as is the default model specified in #1298.
Any vector database can be used to store the resulting embedding and corresponding metadata.
It would be trivial to update the example to use Chroma or FAISS if needed
For the example, we will be using Milvus since it is the VDB chosen for #1298.
Prompt generation type
There are many different ways to eventually build up the final prompt which gets sent to the model. Any convention can be used even custom prompt templates can be used.
For this example, we will be using a custom prompt template similar to the "stuff" retrievers in Langchain. Using a simple custom prompt keeps the implementation easy to understand.
Downstream tasks
After the LLM has been run, the output of the model could be used in any number of tasks such as training a model, running analysis, or even simulating an attack.
For this example, we will not have any downstream tasks to keep the implementation simple and the focus on the LLMEngine
Implementation
This example will be composed of two different click commands.
Standalone Morpheus pipeline
The standalone Morpheus pipeline is built using the following components:
An InMemorySourceStage to hold the LLM queries in a DataFrame
A DeserializationStage to convert the MessageMeta objects into ControlMessages needed by the LLMEngine
New functionality was added to the DeserializeStage to support ControlMessages and add a default task to each message.
A LLMEngineStage then wraps the core LLMEngine functionality
An ExtracterNode pulls the questions out of the DataFrame
A RAGNode performs the retrieval and adds the context to the query using the supplied template and executes the LLM
Finally, the responses are put back into the ControlMessage using a SimpleTaskHandler
The pipeline concludes with an InMemorySink stage to store the results.
Note: For this to function correctly, the VDB upload pipeline must have been run previously.
Persistent Morpheus pipeline
The persistent Morpheus pipeline is functionally similar to the standalone pipeline, however it uses multiple sources and multiple sinks to perform both the upload and retrieval portions in the same pipeline. The benefit of this pipeline over the standalone pipeline is no VDB upload process needed to be run beforehand. Everything runs in a single pipeline.
The implementation for this pipeline is illustrated by the following diagram:
The major differences between the diagram and the example pipeline are:
The source for the upload and retrieval are both KafkaSourceStage to make it easy for the user to control when messages are processed by the example pipeline
To enable this, there are two topics on the Kafka cluster: upload and retrieve_input. Pushing messages to one or the other will have a different effect on the final message but both will perform the same tasks until the final part of the pipeline.
There is a SplitStage added after the embedding portion of the pipeline which determines which sink to send each message to.
The SplitStage determines where to send each message by looking at the task attached to each ControlMessage
The final sink for the retrieval task is sent to another Kafka topic retrieve_output
Completion Criteria
The following items need to be satisfied to consider this issue complete:
[x] A README.md containing information on the following:
[x] Background information about the problem at hand
[x] Information about the specific implementation and reasoning behind design decisions (use content from this issue)
[x] Step-by-step instructions for running the following:
[x] How to run the Standalone Morpheus pipeline
[x] Including a link to instructions on running the VDB upload pipeline
[x] Information on how to get an OpenAI API key and what environment variable to set.
[x] The command to run the Morpheus pipeline
[x] How to run the Persistent Morpheus pipeline
[x] Including instructions on starting the Milvus vector database
[x] Including instructions on starting the Triton inference server
[x] The command to run the Morpheus pipeline
[x] The README.md should be linked in the developer docs
[x] A functioning Standalone Morpheus pipeline command which satisfies the following:
[x] Should run without error using all default arguments
[x] Correctly run the queries through the specified OpenAI model
[x] Provide information about the success or failure of the pipeline. Including the number of queries run, throughput and total runtime.
[ ] #1416
[ ] Should run without error using all default arguments
[ ] Correctly upload the upload task payloads to the VDB
[ ] Correctly run the retrieval task payloads through the specified OpenAI model
[ ] Provide information about the success or failure of the pipeline. Including the number of queries run, throughput and total runtime.
[x] Tests should be added which include the following
[x] Test successfully running the Standalone Morpheus pipeline
[ ] Test successfully running the Persistent Morpheus pipeline
Dependent Issues
The following issues should be resolved before this can be completed:
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
As part of the Sherlock work, an example showing how to use Morpheus to execute multiple LLM queries that utilize RAG inside of a pipeline.
Describe your ideal solution
Purpose
The purpose of this example is to illustrate how a user could build a pipeline while will integrate an LLM service into a Morpheus pipeline. This example builds on the previous example, #1305, by adding the ability to augment LLM queries with context information from a knowledge base. Appending this context helps improve the responses from the LLM by providing additional background contextual and factual information which the LLM can pull from for its response.
In order for this pipeline to function correctly, a Vector Database must already have been populated with information that can be retrieved. An example of populating a database is illustrated in #1298. This example assumes that pipeline has already been run to completion.
Scenario
This example will show two different implementations of a RAG pipeline but the pipeline and components could be used in many scenarios with different requirements. At a high level, the following illustrates different customization points for this pipeline and the specific choices made for this example:
LLMService
interface.llama-cpp-python
"stuff"
retrievers in Langchain. Using a simple custom prompt keeps the implementation easy to understand.LLMEngine
Implementation
This example will be composed of two different
click
commands.Standalone Morpheus pipeline
The standalone Morpheus pipeline is built using the following components:
InMemorySourceStage
to hold the LLM queries in a DataFrameDeserializationStage
to convert theMessageMeta
objects intoControlMessages
needed by theLLMEngine
DeserializeStage
to supportControlMessages
and add a default task to each message.LLMEngineStage
then wraps the coreLLMEngine
functionalityExtracterNode
pulls the questions out of the DataFrameRAGNode
performs the retrieval and adds the context to the query using the supplied template and executes the LLMControlMessage
using aSimpleTaskHandler
InMemorySink
stage to store the results.Note: For this to function correctly, the VDB upload pipeline must have been run previously.
Persistent Morpheus pipeline
The persistent Morpheus pipeline is functionally similar to the standalone pipeline, however it uses multiple sources and multiple sinks to perform both the upload and retrieval portions in the same pipeline. The benefit of this pipeline over the standalone pipeline is no VDB upload process needed to be run beforehand. Everything runs in a single pipeline.
The implementation for this pipeline is illustrated by the following diagram:
The major differences between the diagram and the example pipeline are:
KafkaSourceStage
to make it easy for the user to control when messages are processed by the example pipelineupload
andretrieve_input
. Pushing messages to one or the other will have a different effect on the final message but both will perform the same tasks until the final part of the pipeline.SplitStage
added after the embedding portion of the pipeline which determines which sink to send each message to.SplitStage
determines where to send each message by looking at the task attached to eachControlMessage
retrieval
task is sent to another Kafka topicretrieve_output
Completion Criteria
The following items need to be satisfied to consider this issue complete:
README.md
containing information on the following:upload
task payloads to the VDBretrieval
task payloads through the specified OpenAI modelDependent Issues
The following issues should be resolved before this can be completed:
Additional context
No response
Code of Conduct