MLOps Best Practices: Building Bedrock Agent to query Athena Database

This project is a part of the MLOps Best Practices series. In this project, we will build a Bedrock Agent to query Athena Database. The project is built using AWS CDK and Python.

The Problem

Every organization has a data lake where they store their data. The data is stored in different formats and is queried using different tools. One of the most popular tools to query data in the data lake is Athena. Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. With Athena, there is no need for complex ETL jobs to prepare the data for analysis. This makes it easy for anyone with SQL knowledge to query the data in the data lake.

However, not many people in the organization have SQL knowledge. This makes it difficult for them to benefit from the plethora of data stored in the data lake. To solve this problem, we can build a Bedrock Agent that understand natural language queries and can query the data and reply back with the results.

Solution Example

Bedrock Agent Chat UI

Not only a demo

This project is designed not just as a demo. It is a real-world project that can be used in production. The project is built using best practices:

IaC: The project is built using AWS CDK which is an Infrastructure as Code tool. This makes it easy to deploy the project in multiple environments.
Security: Using IAM fine-grained permission can pass security review of CISO teams.
Modularity: Using multiple managed services of AWS that can scale up and be configured to different enterprise environment.
Cost-effective: Using serverless services that are cost-effective and can scale up and down based on the usage. In less than 10$ per month, you can have a Bedrock Agent that can query Athena Database.

The Solution

Architecture Diagram

The solution is built using the following services:

S3: The data is stored in S3.
Glue: The data in S3 is crawled and cataloged using Glue.
Athena: The data in S3 is queried using Athena.
Lambda: The Bedrock Agent is built using Lambda function as Action Group.
Bedrock: The Bedrock Agent is built using Bedrock.
App Runner: The UI for the Bedrock Agent is deployed using App Runner.

Quick Start

Clone the repository

git clone https://github.com/guyernest/bedrock-agent.git

Install the dependencies

cd bedrock-agent
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Deploy the stack (wait for the deployment to finish, about 5 minutes, and note the output values for the next steps)
```
cdk deploy
```

Upload the data to S3

aws s3 cp sample-data/ s3://<bucket-name>/data --recursive

Trigger Glue Crawler (wait for the crawler to finish, about 2 minutes)
```
aws glue start-crawler --name <crawler-name>
```
Open the App Runner URL in the browser (appears in the output of the CDK deployment)

Main Components

CDK Stack: The main CDK stack that deploys the Bedrock Agent and its instructions.
Lambda Action: The Lambda function that acts as the Action Group for the Bedrock Agent, and its schema.
Chat UI: The FastAPI application that serves the UI for the Bedrock Agent, and its HTML template.

Messy Data Analysis

The project includes a sample dataset in the sample-data directory. However, other datasets can be used. The following Jupyter notebook gives an example of how to analyze the data to enrich the AI agent prompt, to better understand the data and answer the natural language questions.

Messy Data Analysis

CDK Python Instructions

This project is set up like a standard Python project. The initialization process also creates a virtualenv within this project, stored under the .venv directory. To create the virtualenv it assumes that there is a python3 (or python for Windows) executable in your path with access to the venv package. If for any reason the automatic creation of the virtualenv fails, you can create the virtualenv manually.

To manually create a virtualenv on MacOS and Linux:

python3 -m venv .venv

After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv.

source .venv/bin/activate

If you are a Windows platform, you would activate the virtualenv like this:

.venv\Scripts\activate.bat

Once the virtualenv is activated, you can install the required dependencies.

pip install -r requirements.txt

At this point you can now synthesize the CloudFormation template for this code.

cdk synth

To add additional dependencies, for example other CDK libraries, just add them to your setup.py file and rerun the pip install -r requirements.txt command.

Useful commands

cdk ls list all stacks in the app
cdk synth emits the synthesized CloudFormation template
cdk deploy deploy this stack to your default AWS account/region
cdk diff compare deployed stack with current state
cdk docs open CDK documentation

Enjoy!

guyernest / bedrock-agent

readme