pranayroni / azure-search-openai-demo-csharp

documentbot in .net
MIT License
0 stars 1 forks source link

Document Bot Documentation

1. Introduction

Welcome to Knipper's Document Bot. This project aims to provide a comprehensive solution for managing documents and interacting with the contained data through a chat-based interface. We use Retrieval Augmented Generation (RAG) to generate responses that are understandable within the context of the conversation, but also accurate with details from the provided documents.

RAG Architecture

Key features of the Document Bot include:

This documentation will guide you through the deployment, API usage, and general usage of the Document Bot. We hope you find this tool valuable and easy to use.


2. Deployment

2.1. Prerequisites

In order to deploy and run this project, you'll need

2.2. Deployment Options

The Document Bot can be deployed using the following methods:

GitHub Codespaces / VS Code Remote Containers

Local environment

Install the following prerequisites:

Then, run the following commands to get the project on your local environment:

  1. Run azd auth login
  2. Clone the repository or run azd init -t azure-search-openai-demo-csharp
  3. Run azd env new azure-search-openai-demo-csharp

2.3. Deployment Steps

For a detailed Deployment guide, see the Deployment Guide.

Use existing resources

This method is reccommended for use at Knipper. Please request to be added to the rg-ai-knipper-docbot-csharp resource group, or an identical one.

If you have existing resources in Azure that you wish to use, you can configure azd to use those by setting the following azd environment variables:

  1. Run azd env set AZURE_OPENAI_SERVICE {Name of existing OpenAI service}
  2. Run azd env set AZURE_OPENAI_RESOURCE_GROUP {Name of existing resource group that OpenAI service is provisioned to}
  3. Run azd env set AZURE_OPENAI_CHATGPT_DEPLOYMENT {Name of existing ChatGPT deployment}. Only needed if your ChatGPT deployment is not the default 'chat'.
  4. Run azd env set AZURE_OPENAI_EMBEDDING_DEPLOYMENT {Name of existing embedding model deployment}. Only needed if your embedding model deployment is not the default embedding.
  5. Run azd up

[!NOTE]
You can also use existing Search and Storage Accounts. See ./infra/main.parameters.json for list of environment variables to pass to azd env set to configure those existing resources.

Deploying from scratch

Important
Ensure Docker is running before running any azd provisioning / deployment commands.

Execute the following command, if you don't have any pre-existing Azure services and want to start from a fresh deployment.

  1. Run azd up - This will provision Azure resources and deploy this sample to those resources, including building the search index based on the files found in the ./data folder.

    • For the target location, the regions that currently support the model used in this sample are East US 2 , East US or South Central US. For an up-to-date list of regions and models, check here
    • If you have access to multiple Azure subscriptions, you will be prompted to select the subscription you want to use. If you only have access to one subscription, it will be selected automatically.

    Note
    This application uses the gpt4o model. When choosing which region to deploy to, make sure they're available in that region (i.e. EastUS). For more information, see the Azure OpenAI Service documentation.

  2. After the application has been successfully deployed you will see a URL printed to the console. Click that URL to interact with the application in your browser.

[!NOTE]
It may take a few minutes for the application to be fully deployed.

Deploying or re-deploying a local clone of the repo

[!IMPORTANT]
Ensure Docker is running before running any azd provisioning / deployment commands.

Deploying your repo using App Spaces

[!NOTE]
Make sure you have AZD supported bicep files in your repository and add an initial GitHub Actions Workflow file which can either be triggered manually (for initial deployment) or on code change (automatically re-deploying with the latest changes) To make your repository compatible with App Spaces, you need to make changes to your main bicep and main parameters file to allow AZD to deploy to an existing resource group with the appropriate tags.

  1. Add AZURE_RESOURCE_GROUP to main parameters file to read the value from environment variable set in GitHub Actions workflow file by App Spaces.
    "resourceGroupName": {
      "value": "${AZURE_RESOURCE_GROUP}"
    }
  2. Add AZURE_TAGS to main parameters file to read the value from environment variable set in GitHub Actions workflow file by App Spaces.
    "tags": {
      "value": "${AZURE_TAGS}"
    }
  3. Add support for resource group and tags in your main bicep file to read the value being set by App Spaces.
    param resourceGroupName string = ''
    param tags string = ''
  4. Combine the default tags set by Azd with those being set by App Spaces. Replace tags initialization in your main bicep file with the following -
    var baseTags = { 'azd-env-name': environmentName }
    var updatedTags = union(empty(tags) ? {} : base64ToJson(tags), baseTags)
    Make sure to use "updatedTags" when assigning "tags" to resource group created in your bicep file and update the other resources to use "baseTags" instead of "tags". For example -
    ```json
    resource rg 'Microsoft.Resources/resourceGroups@2021-04-01' = {
     name: !empty(resourceGroupName) ? resourceGroupName : '${abbrs.resourcesResourceGroups}${environmentName}'
     location: location
     tags: updatedTags
    }

Running locally

[!IMPORTANT]
Ensure Docker is running before running any azd provisioning / deployment commands. If you do not have docker, run with the .NET MAUI client.

  1. Run azd auth login
  2. After the application deploys, set the environment variable AZURE_KEY_VAULT_ENDPOINT. You can find the value in the .azure/YOUR-ENVIRONMENT-NAME/.env file or the Azure portal.
  3. Run the following .NET CLI command to start the ASP.NET Core Minimal API server (client host):

    dotnet run --project ./app/backend/MinimalApi.csproj --urls=http://localhost:7181/

Navigate to http://localhost:7181, and test out the app.

Running locally with the .NET MAUI client

This sample includes a .NET MAUI client, packaging the experience as an app that can run on a Windows/macOS desktop or on Android and iOS devices. The MAUI client here is implemented using Blazor hybrid, letting it share most code with the website frontend.

  1. Open app/app-maui.sln to open the solution that includes the MAUI client

  2. Edit app/maui-blazor/MauiProgram.cs, updating client.BaseAddress with the URL for the backend.

    If it's running in Azure, use the URL for the service backend from the steps above. If running locally, use http://localhost:7181.

  3. Set MauiBlazor as the startup project and run the app

Sharing Environments

Run the following if you want to give someone else access to the deployed and existing environment.

  1. Install the Azure CLI
  2. Run azd init -t azure-search-openai-demo-csharp
  3. Run azd env refresh -e {environment name} - Note that they will need the azd environment name, subscription Id, and location to run this command - you can find those values in your ./azure/{env name}/.env file. This will populate their azd environment's .env file with all the settings needed to run the app locally.
  4. Run pwsh ./scripts/roles.ps1 - This will assign all of the necessary roles to the user so they can run the app locally. If they do not have the necessary permission to create roles in the subscription, then you may need to run this script for them. Just be sure to set the AZURE_PRINCIPAL_ID environment variable in the azd .env file or in the active shell to their Azure Id, which they can get with az account show.

Clean up resources

Run azd down

3. Preparing Documents

3.1. Document Format

The Document Bot is designed to work with PDF documents. When uploading documents, ensure they are in PDF format. The Document Bot will extract text from these documents to generate responses to user queries. Currently, no other format of documents is supported.

3.2. Document Upload

To upload documents to the Document Bot in mass, use the PrepareDocs startup item. This will allow you to upload a large quantity of documents at once, with varying categories. This startup item runs a script that uploads all documents in the data folder to the database. Simply move all desired documents to the data folder and run the PrepareDocs startup item. If a document is already in the database, it will be skipped.

3.2.1. Database Creation

When running the PrepareDocs startup item, the script will create a new knowledge base index if one does not already exist. We do not recommend running the PrepareDocs startup item to generate the knowledge base due to settings customization. Once documents are uploaded to the knowledge base, some settings may not be changed. Below are examples of our knowledge base fields and semantic configuration. Index Fields Index Semantic Configurations

3.2.2. Document Categories

When uploading documents, you can assign one or more categories to each document. These categories are used to filter the documents that the chatbot searches through when generating responses. To assign categories when uploading through the PrepareDocs startup item, sort the documents into folders named after the category you wish to assign them. The script will automatically assign the category based on the folder name. To add multiple categories to a document, you may use a nested folder structure. For example:

data
|-- Business Rules
|   |-- Document1.pdf
|   |-- Document2.pdf
|
|-- Knipper
    |-- HR
    |   |-- Document3.pdf
    |   |-- Document4.pdf
    |
    |-- Document5.pdf

3.3. Document Deletion

To delete documents from the database using the PrepareDocs startup item, navigate to the launchSettings.json file in the Properties folder of the PrepareDocs project. In launchSettings.json, add the tag --removeall to the commandLineArgs string. The tag --removeall will delete all documents from the database, so use it with caution. To delete specific documents, add the tag --remove and move the documents you wish to delete into the data folder. This fuctionality has NOT been tested and may not work as intended.

3.4. Prepdocs Troubleshooting

We have specific details in Prepdocs.md as well as some general information in Troubleshooting.md

4. API

4.1. Frontend API

[url]:

4.2. Backend API

All Backend API endpoints are prefixed with /api. They are called by the frontend through ApiClient.cs and are used to interact with the backend services. They are defined in WebApplicationExtensions.cs.

[url]/api/username:

{ "messages": [
{ "role":"user", "content":"How often does the Quality Council meet?", "isUser":true }, { "role":"assistant", "content":"The Quality Council meets at least monthly.... [QA-021-1.pdf]", "isUser":false }, { "role":"user", "content":"Who are the members of the Quality Council?", "isUser":true } ], "overrides": { "semantic_ranker":true, "retrieval_mode":"Hybrid", "semantic_captions":false, "exclude_category":[], "top":5, "temperature":null, "prompt_template":null, "prompt_template_prefix":null, "prompt_template_suffix":null, "suggest_followup_questions":true, "use_gpt4v":false, "use_oid_security_filter":false, "use_groups_security_filter":false, "vector_fields":false}, "lastUserQuestion":"Who are the members of the Quality Council?", "approach":0 } }

 - Example Response:

{ "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The Quality Council is a cross-functional team ... [QA-045-15.pdf]. Departments include ... [QA-021-2.pdf]. <<What are the main responsibilities of the Quality Council?>> <<...>> " }, "context": { "dataPointsContent": [ { "title": "QA-021-0.pdf", "content": "of this procedure is ..." }, { ... } ], "dataPointsImages": null, "followup_questions": [ "What are the main responsibilities of the Quality Council?", "..." ], "thoughts": [ { "title": "Thoughts", "description": "I utilized multiple sources ...", "props": null } ], "data_points": { "text": [ "QA-021-0.pdf: of this procedure is ...", "QA-021-1.pdf: Those in attendance ...", "foo.pdf: ..." ] }, "thoughtsString": "Thoughts: I utilized multiple sources ..." }, "citationBaseUrl": "https://{storage_container_name}.blob.core.windows.net/content", "content_filter_results": null } ] }


#### [url]/api/categories:
- **GET**: Retrieves a list of all categories given to documents.
- Example Response:

[ "Business Rules", "Client", "Knipper", ... ]

#### [url]/api/delete/blobs:
- **POST**: Deletes a document from the blob storage.
    - Example Request:

{ "file":"2024 Holiday Schedule.pdf" }

#### [url]/api/delete/embeddings:
- **POST**: Deletes a document's embeddings from the knowledge base.

{ "file":"2024 Holiday Schedule.pdf" }

#### [url]/api/sourcefiles:
- **POST**: When given a list of document blob names, retrieves the respective links.
    - Example Request:

{ "FileNames": [ "Current Handbook as of 3-12-2014-0.pdf", "Current Handbook as of 3-12-2014-1.pdf" ] }

    - Example Response:

[ "https://{storage_container_name}.blob.core.windows.net/content/Current%20Handbook%20as%20of%203-12-2014-0.pdf", "https://{storage_container_name}.blob.core.windows.net/content/Current%20Handbook%20as%20of%203-12-2014-1.pdf" ]



## 5. Usage
### 5.1. Logging In
To ensure only authorized Knipper employees can use Document Bot, you must first log in with your Microsoft account. This is done by clicking the "Log In" button on the homepage and following the prompts to sign in with your Microsoft account.
### 5.2. Documents
On this page, you can view all documents in the database, search for specific documents, upload new documents, and delete existing documents. Documents are shown not as wholes, but as individual pages.

***Both Uploading and Deletion occur within the scope of the session. You may switch to other pages within the site, but do not close the tab until uploading or deletion is complete in order to prevent errors in the database.***

### 5.2.1. Uploading Documents
You can upload documents through the Documents page. Simply select the files you wish to upload, label them with one or more categories, and click "Upload". 
Up to 10 documents can be uploaded at once. All the uploaded documents in the same batch will be labeled with the same categories. 
This process may take a few minutes based on the number and size of documents being uploaded.
When labeling documents, you can search for existing categories through the Multi-Select Autocomplete. To create new categories, you can type in the category name and press enter.
### 5.2.2. Deleting Documents
To delete a document, click the trash icon next to the document you wish to delete. Despite the documents being displayed as pages, this action does not delete only that page.
Deletion will delete all pages of the document selected. 
### 5.3. Chat
To chat, press the "New Chat" button to create a new instance of a chat. You can ask it questions about the documents and it will search the database for an answer.
If no relevant documents are found, the chat search the internet for an answer. 
Each chat instance maintains its own chat history and can be accessed by clicking on the chat instance in the chat list. 
Chats will only take into account the chat history of the specific instance they are part of when generating responses.
You can exclude categories of documents from your chat responses by searching them in the Multi-Select Autocomplete.
### 5.4. Profile
The profile page displays your Microsoft account information. This page is not currently used for any functionality, but may be used in the future for account management.
You may find your auth token to be used in the API here, as well as other information for debugging purposes.