[RFC] No-code designer for AI-augmented workflows

ohltyler commented 1 year ago

Note: this RFC is for the no-code frontend associated with the proposed AI workflow framework RFC.

Proposal

In the proposed AI workflow framework RFC, we detail how we can simplify the configuration and creation of complex ML use cases by interfacing them in the form of use-case templates. The frontend no-code designer can simplify the creation of these use-case templates via a drag-and-drop interface, similar to existing solutions built on top of LangChain, such as Flowise or LangFlow. This will provide users an intuitive no-code alternative to create and edit common AI-augmented workflows, and to enable rapid prototyping and validation of a created flow.

We aim to keep this designer very lightweight. It can be viewed simply as a means of interacting with use case templates in a visual manner. It is a thin interface built on top of the use-case templates, such that the backend plugin can be fully sufficient for users, and the frontend plugin can be ignored entirely if users exclusively want to interact via APIs.

Additionally, we will provide a set of pre-defined templates for common use cases, such as semantic search or RAG. This will give users a great starting point to be able to quickly develop their applications, and get familiar with the available ML offerings OpenSearch provides.

Goals and benefits

The goal is very similar to those outlined in the proposed AI framework RFC - we want to simplify the complex setup around creating, configuring, and modifying available AI-augmented application workflows provided by OpenSearch. While the backend plugin is focused on providing a framework to automate complex use cases, the frontend plugin is focused on providing an easy-to-use user interface for creating, configuring, editing, and importing such use cases via a drag-and-drop experience.

This will greatly benefit users by providing a visual method of interacting with the templates, and provide further abstraction to the low-level workflow details. It will let users (including those with little to no expertise) be able to leverage the complex and growing ecosystem of AI/ML offerings within OpenSearch via out-of-the-box workflows that users can utilize as a starting point for their own workflows.

Background

Over the last several OpenSearch releases, there have been a growing number of building blocks and tools added to enable AI-augmented workflows, such as semantic search. This is accomplished by configuring and piecing together components from many different plugins, such as the k-NN plugin, ML-commons plugin, neural search plugin, and others planned for the future. For more details, see the “Background” section in the proposed AI workflow framework.

At a high level, the backend framework will simplify the creation of common AI/ML use cases by automating the steps needed to execute such use cases. For example, the current semantic search use case requires configuring a text embedding processor, ingest pipeline, embedding model, k-NN index, and constructing neural queries linking together all of these components to execute the workflow. We can simplify this through a single JSON template that the backend framework can process to construct a workflow, orchestrating the sequential flow of API calls from different plugins. A workflow ID will be returned, which can be used to interface with the workflow by passing the specified input (e.g., plain text for a semantic search workflow). Even though these templates are already an abstraction, they can still become quite complex depending on the use case. This is where a UI can provide a simple way for users to configure and create, import, and edit such templates, and help bridge that gap.

High level design

As stated previously, we aim to simplify the design as much as possible, and treat this plugin as a lightweight way of viewing, configuring, and creating use-cases in this templated format that the backend plugin can parse and execute. We can do this through a drag-and-drop style interface allowing users to pick and choose different building blocks and construct an end-to-end workflow. We can take inspiration from popular existing solutions for building large language model (LLM)-based applications, such as Flowise or LangFlow. These applications have 5 major components:

A browsable catalog of preset workflows that can be imported to the workspace and used as a starting point
A browsable catalog of available individual components that each have an input and output
An interactive workspace where users can drag and drop components, piecing them together to create an end-to-end workflow
A browsable catalog of existing created workflows where users can manage them with CRUD operations
A widget/section for testing out the workflow, letting users test out different inputs and see the outputs. Note that in our case, we will need to support many different input and output types. We may need a set of different test widgets/sections depending on the use case and specified input/output. This could be selected from a dropdown list, for example.

We can follow a similar design for our plugin, where the individual nodes/components are a set of available resources within OpenSearch, such as deployed ML models/connectors, search pipelines, ingest pipelines, processors, indices, etc. These components can have different levels of abstraction, such that users with little to no experience may still be able to wire together a complex workflow, while also allowing advanced users to drill down and see the underlying interconnected subcomponents. This is explained in more detail below.

Proposed implementation

We can break down the implementation into two main portions:

1. Drag-and-drop workspace

There are two main options for this portion:

ReactFlow: Create from the ground up using the ReactFlow library
Existing applications: Fork existing applications built on top of ReactFlow (Flowise, LangFlow) and reuse parts of their implemented interfaces, catalog components, workspace design, etc. into our own plugin

We prefer the ReactFlow option for several reasons:

We can build it around both LLM and non-LLM use cases
We can focus the UX on OpenSearch capabilities, and not have the extremely tight integration with LangChain
We don’t awkwardly fork application source code and pull out just the parts we need (e.g., ignoring all server-side code)
The layouts and styling can correspond seamlessly with OpenSearch UI (OUI) components and styling
We provide a differentiated experience targeting a broader audience than just those exclusively using Flowise/ReactFlow.

2. Components & workflows

We want to provide an experience as simple and abstracted as possible, while still allowing users to drill down and customize individual details within their application. We can accomplish this using concepts of components and workflows:

Component: an OpenSearch-specific resource such as an ML model, a search pipeline, an index, or an ML-commons tool or agent, that will have specified inputs and outputs. Components can be nested inside of other components. Workflow: an end-to-end application consisting of a set of components stitched together representing a single use case, such as semantic search

We can use the semantic search use case as an example. Suppose a user wants to leverage OpenSearch’s neural search plugin to create a plaintext search application on their website using OpenSearch as the vector store. For a user to configure this manually, it requires many individual steps, and low-level resource creation using several different APIs (for more details, see the documentation). A breakdown with each resource’s dependencies is shown below:

Using this new plugin, users can quickly configure and create the same low-level resources using components: At the highest level, we can have a “Neural search” component, that has a single input and output, both of which are plaintext. It has a set of required and optional fields the user can fill out, and allow the framework to handle all of the underlying creation and configuration:

We can allow users a mechanism to drill down into this component, and see lower-level subcomponents. This may look something like this:

Breaking down further: we can see each individual component:

Component types

We can persist a set of different component types, each having their own set of required & optional fields, input types, output types, styling, and in-component-creation logic (see “In-component creation” below for details). These will be a 1:1 mapping of categories shown in the component catalog where users can drag and drop components into the workspace. Using the example above, we can show what a Neural Query component may look like:

type NeuralQueryComponent = BaseComponent & {
   inputs: [ EmbeddingModelComponent, NeuralIndexComponent ];
   outputs: [ OpenSearchResponse ]
   style: { ... },
   ...
}

In-component creation

Eventually, we will want to expand functionality such that all resources for a particular component can be created entirely within the component itself. For example, having a “create new” option on an embedding model component would mean supporting the creation of underlying low-level OpenSearch resources (AI connectors, third party service infrastructure configuration, model group permissions, etc.) entirely within the component. This removes the total number of steps and configuration the user needs to perform in order to use the plugin.

Some of these complex creation logic flows are not planned to be supported initially. Some components, such as individual ingest pipelines or k-NN indices, may be supported implicitly by filling out creation-related input fields within the component, and executing the workflow. The scope of how and what components should initially support this is an open question.

Serialization / deserialization

An important aspect of this plugin is the deconstruction and reconstruction of these flows into a readable JSON format to pass via API to the backend plugin. By default, we can get a ReactFlow JSON object by calling toObject() on a ReactFlow Instance:

type ReactFlowJsonObject<NodeData = any, EdgeData = any> = {
  nodes: Node<[NodeData](https://reactflow.dev/docs/api/nodes/node-options/)>[];
  edges: Edge<[EdgeData](https://reactflow.dev/docs/api/edges/edge-options/)>[];
  viewport: Viewport;
};

NodeData and EdgeData have many optional parameters, but commonly includes the following. Note we include some of the subflow-related fields which allow us to support nested nodes. We also show the Viewport which indicates the current position of the workspace:

type NodeData {
   id: string;
   type: string;
   data: { label: string };
   position: { x: number; y: number; };
   parentNode: string;   // ID of a parent node
    extent: 'parent'      // prevents from moving this node outside of its parent
}

type EdgeData {
   id: string;
   type: string;
   source: string;
   target: string;
   label: string;
}

type Viewport {
   x: number;     // horizontal offset
   y: number;     // vertical offset
   zoom: number;  // zoom level (default: 1.00)
}

Suppose we have a workflow that connects an embedding model to a neural index to use for semantic search. We could output something roughly like the following:

workflow = {
   nodes: [
        {
            id: 'node-1',
            type: 'embedding_model',
            data: {
                // including the user-configured fields (ex: 3P model)
                label: 'Embedding Model',
                name: 'my-embedding-model',
                apiKey: 'my-api-key'
            }
            position: { x: 100, y: 100 },
        },
        {
            id: 'node-2',
            type: 'neural_index',
            data: {
                // including the user-configured fields (ex: k-NN index)
                label: 'Neural Index',
                name: 'my-knn-index',
                inputField: 'passage_text',
                outputField: 'passage_embedding'
            },
            position: { x: 100, y: 200 }
        }
   ],
   edges: [
        {
            id: 'edge-1',
            type: 'regular',
            source: 'node-1',
            target: 'node-2',
        }
   ],
   // this 
   viewport: {
        x: 0,
        y: 0,
        zoom: 1.00
   }
}

ReactFlow provides common examples of saving/restoring using the toObject() fn in callback helper fns. We can do something similar, along with any other metadata needed, and format into what is expected for the backend API.

Note that for all of the custom nodes and styling, we can persist exclusively on the frontend. In the documentation, you can see that nodeTypes is passed as a standalone field when constructing the ReactFlow instance. So, when serializing or deserializing, we can strip away or add nodeTypes, respectively. We can persist each node’s type via type field within each node.

Additional logic will be needed for converting a ReactFlow JSON representation into the interfaced use-case template that will be understood by the backend framework. The exact format of that template and what it will contain is still being discussed.

Open questions

How should we handle intermediate saving? Users may want to save when (1) they are partially complete and will return later, (2) are performing rapid prototyping and testing out different small configuration changes, and (3) ultimately satisfied with the outputs and want to save this particular workflow to be used in production. How do we clean up the unused created workflows on the backend? How should we save partially completed workflows?
How should validation be handled? Between each node state? When user saves? When they test via a testing portal? This may have backend implications such as providing a validation API. One idea is having all basic validation done on the frontend, while any nuanced errors can be caught in the backend validation (e.g., the model dimensions don’t align with the text processor in a semantic search configuration).
Where should a browsable catalog of preset use-cases be persisted? At the least, we will want access to them within this plugin. Is this something we want to make available for backend-only users as well?
How can we clearly show a workflow-level input & output defined? For example, Flowise does not make it clear what the start/end is within some flow. Given the different inputs and outputs we will want to support, this will be extra important
How do we support drilldown capabilities? We can have many different levels of abstraction (see “Components and workflows” section above)
How should we handle downstream plugin logic that will need to be executed in this plugin? For example, suppose we fill a dropdown menu of available embedding models deployed with ML-Commons plugin. How/where will the API calls, post-processing, filtering, etc. happen? Who will own this part? This is somewhat related to in-component creation regarding injecting plugin-specific logic in these components.
The backend will persist the low-level workflow details, like the set of sequential steps & API calls to make. How will we persist the higher-level data models, like the use-case template and/or the ReactFlow JSON? Can we convert one to the other to not have to maintain multiple representations?
The use case templates themselves will not contain the detailed list of particular API calls that need to be made to each plugin, which is how the backend plugin will ultimately build and execute these workflows. Where should this mapping & conversion logic live? Frontend / backend / static file?

jonfritz commented 1 year ago

Thanks for sharing this. My main comment is less on implementation and more about why users would like this in OpenSearch and the use cases for it in this platform. I'm having trouble identifying what "common AI/ML use cases" we are trying to address here, given we already have ways to do semantic search and conversational search/question answering that are easy for customers to configure and use.

Although projects like LangChain exist to build LLM applications, it's not clear to me that users are looking to run that type of component in an OpenSearch cluster for arbitrary AI apps. I would recommend adding some specific examples and use cases from users to why they want this functionality specifically in OpenSearch (a search platform). Otherwise, if we cannot find these use cases, it may be worth creating this as an experimental plugin and seeing if it gets traction first (e.g. do customers want to run these components in OpenSearch or use LangChain outside of the cluster in their application stack).

ohltyler commented 1 year ago

@jonfritz thanks for taking a look!

I'm having trouble identifying what "common AI/ML use cases" we are trying to address here, given we already have ways to do semantic search and conversational search/question answering that are easy for customers to configure and use.

The perspective here is that these use cases (e.g., semantic search, RAG) are not easy to configure and use. The primary purpose of the backend framework (see RFC) is to automate some of this infrastructure setup, while the primary purpose of a drag-and-drop UI is to help visualize and construct the infrastructure needed.

Although projects like LangChain exist to build LLM applications, it's not clear to me that users are looking to run that type of component in an OpenSearch cluster for arbitrary AI apps

A big motivation for having this framework, along with in-cluster AI/ML components, is to remove application builders from needing to have middleware infrastructure for running them; the entire workflows can happen within OpenSearch. Feel free to leave comments on that RFC which may be a better place for discussion, since this is more about the drag-and-drop UI of the backend framework components.

opensearch-project / OpenSearch-Dashboards