terraform-ibm-modules / stack-retrieval-augmented-generation

Apache License 2.0
1 stars 8 forks source link
core-team ibm-cloud ibm-cloud-stack terraform terraform-stack

Retrieval Augmented Generation Pattern for Watsonx on IBM Cloud

The following deployable architecture automates the deployment of a sample GenAI Pattern on IBM Cloud, including all underlying IBM Cloud infrastructure. This architecture implements the best practices for Watsonx GenAI Pattern deployment on IBM Cloud, as described in the reference architecture.

This deployable architecture provides a comprehensive foundation for trust, observability, security, and regulatory compliance by configuring the IBM Cloud account to align with compliance settings, deploying key and secret management services, and deploying the infrastructure to support CI/CD/CC pipelines for secure application lifecycle management. These pipelines facilitate the deployment of the application, vulnerability checks, and auditability, ensuring a secure and trustworthy deployment of Generative AI applications on IBM Cloud.

Objective and Benefits

This deployable architecture is designed to showcase a fully automated deployment of a retrieval augmented generation application through IBM Cloud Project, providing a flexible and customizable foundation for your own Watson-based application deployments on IBM Cloud. This architecture deploys the following sample application by default.

By leveraging this architecture, you can accelerate your deployment and tailor it to meet your unique business needs and enterprise goals.

By using this architecture, you can:

Deployment Details

To deploy this architecture, follow these steps.

1. Prerequisites

Before deploying the deployable architecture, ensure you have:

Ensure that you are familiar with the "Important Deployment Considerations" located at the bottom of this document.

2. Deploy the Stack in a New Project from Catalog

3. Set the Input Configuration for the Stack

After completing Step 2 - Deploy the Stack in a New Project from Catalog, you are directed to a page allowing you to enter the configuration for you deployment:

You may explore the other available inputs, such as the region and resource group name (under optional tab), leave them as is, or modify them as needed.

Once ready, click the "Save" button at the top of the screen.

4. Deploy the Architecture

Navigate to the project deployment view by clicking the project name in the breadcrumb menu.

menu

You should be directed to a screen looking like:

validate

Note: in some rare occurences, the first member of the stack may not be marked as "Ready to validate". Refreshing the page in your browser window should solve this problem.

Two approaches to deploy the architecture:

  1. Fully Automated End-to-End. Recommended for demo or non-critical environments. This approach allows Project to validate, approve, and deploy all stack members automatically.
  2. Member-by-Member. Recommended for critical environments, such as production. This approach enables a detailed review of changes from each stack member before automation is executed, ensuring precise control over the deployment process.

Approach 1: Fully Automated End-to-End

To enable auto-deployment:

  1. Go to Manage > Settings > Auto-deploy and toggle On. auto-deploy
  2. Return to the Configurations tab and click Validate under stack configuration. validate button

The project will then validate, approve, and deploy each stack member, taking approximately one hour to complete.

Approach 2: Member-by-Member

  1. Click on validate

    validate button

  2. Wait for validation

    validation

  3. Approve and click the deploy button

    deploy

  4. Wait for deployment

  5. Repeat step 1 for the next configuration in the architecture. Note that as you progress in deploying the initial base configuration, you will be given the option to validate and deploy multiple configuration in parallel.

5. Post deployment steps

At this point, the infrastructure has been successfully deployed in the target account, and the initial build of the sample application has started in the newly-provisioned DevOps service.

Monitoring the Build and Deployment

To monitor the build and deployment of the application, follow these steps:

  1. Access the DevOps Toolchains View: Navigate to the DevOps / Toolchains view in the target account.
  2. Select the Resource Group and Region: Choose the resource group and region where the infrastructure was deployed. The resource group name is based on the prefix and resource_group_name inputs of the deployable architecture.
  3. Select the Toolchain: Select "RAG Sample App-CI-Toolchain" toolchain
  4. Access the Delivery Pipeline: In the toolchain view, select ci-pipeline under Delivery pipeline toolchain
  5. View the CI Pipeline Status: The current status of the CI pipeline execution can be found under the "rag-webhook-trigger" section.

Verifying the Application Deployment

Once the initial run of the CI pipeline complete, you should be able to view the application running in the created Code Engine project.

Enabling Watson Assistant

After the application has been built and is running in Code Engine, there are additional steps specific to the sample app that need to be completed to fully enable Watson Assistant in the app. To complete the installation, follow the steps outlined in the application README.md file.

6. Important Deployment Considerations

API Key Requirements

The deployable architecture can only be deployed with an API Key associated with a user. It is not compatible with API Keys associated with a serviceId. Additionally, it cannot be deployed using the Project trusted profile support.

Notification of New Configuration Versions ("Needs Attention")

You may see notifications in IBM Cloud Project indicating that one or more configurations in the stack have new versions available. You can safely ignore these messages at this point, as they will not prevent you from deploying the stack. No specific action is required from you.

new version

Please note that these notifications are expected, as we are rapidly iterating on the development of the underlying components. As new stack versions become available, the versions of the underlying components will also be updated accordingly.

Limitations with the Trial Secret Manager Offering

The automation is configured to deploy a Trial version of Secret Manager by default to minimize costs. However, the Trial version has some limitations. If you want to avoid these limitations, you can opt to deploy a standard (paid) instance of Secret Manager under the Optional settings of the stack.

Here are the limitations of the Trial version:

What are reclamations? In IBM Cloud, when you delete a resource, it doesn't immediately disappear. Instead, it enters a "reclamation" state, where it remains for a short period of time (usually 7 days) before being permanently deleted. During this time, you can still recover the resource if needed.

To resolve the re-deployment failure, you will need to delete the Secret Manager service from the reclamation state by running the following commands:

ibmcloud resource reclamations #  lists all the resources in reclamation state, get the reclamation ID of the secret manager service
ibmcloud resource reclamation-delete <reclamation-id>

Customization options

There are numerous customization possibilities available out of the box. This section explores some common scenarios, but is not exhaustive.

Editing Individual Configurations

Each configuration in the deployed stack surfaces a large number of input parameters. You can directly edit each parameter to tailor your deployment by selecting the Edit option in the menu for the corresponding configuration on the right-hand side.

edit config

This approach enables you to:

Removing Configurations from the Stack

You can remove any configuration from the stack, provided there is no direct dependency in later configurations, by selecting the Remove from Stack option in the right-hand side menu for the corresponding configuration.

This applies to the following configurations:

edit config

Managing Stack-Level Inputs and Outputs

You can add or remove inputs and outputs surfaced at the stack level by following these steps:

  1. Select the stack configuration

    stack def

  2. You are presented with a screen allowing you to promote any of the configuration inputs or outputs at the stack level

    stack def

Sharing Modified Stacks through a Private IBM Cloud Catalog

Once you have made modifications to your stack in Project, you can share it with others through a private IBM Cloud Catalog. To do so, follow these steps:

  1. Deploy the stack at least once: You need to deploy the stack first to allow importing the stack definition to a private catalog.
  2. Select the "Add to private catalog" option in the menu located on the stack configuration.

This will allow you to share your modified stack with others through a private IBM Cloud Catalog.

Customizing for Your Application

As you deploy your own application, you may want to remove the last configuration (Sample RAG app configuration), which is specific to the sample app provided out of the box. You can use the code of this sample automation as a guide to implement your own, depending on your application needs. The code is available at https://github.com/terraform-ibm-modules/terraform-ibm-rag-sample-da.

Undeploying/Deleting the Stack, and all associated Infrastructure Resources

Cleanup the configuration

This step is optional if you are planning to fully destroy all Watson resources. The artifacts created by the application will be deleted as part of undeploying the Watson resources.

Follow the steps outlined in the cleanup.md file file to remove the configuration specific to the sample app.

Undeploying Infrastructure

To undeploy the infrastructure created by the automation, complete the following steps:

1. Delete Resources Created by the CI toolchain

Those resources are not destroyed automatically as part of undeploying the stack in Project:

2. Undeploy Configurations in the Project

Select "Undeploy" option in the menu associated with the stack in the project. undeploy

3. Delete Project

Once all configurations are undeployed, you may delete the project.