Welcome to Knipper's Document Bot. This project aims to provide a comprehensive solution for managing documents and interacting with the contained data through a chat-based interface. We use Retrieval Augmented Generation (RAG) to generate responses that are understandable within the context of the conversation, but also accurate with details from the provided documents.
Key features of the Document Bot include:
This documentation will guide you through the deployment, API usage, and general usage of the Document Bot. We hope you find this tool valuable and easy to use.
In order to deploy and run this project, you'll need
Microsoft.Authorization/roleAssignments/write
permissions, such as User Access Administrator or Owner.
The Document Bot can be deployed using the following methods:
azd up
.Install the following prerequisites:
Powershell 7+ (pwsh) - For Windows users only.
Important
Ensure you can runpwsh.exe
from a PowerShell command. If this fails, you likely need to upgrade PowerShell.
Important
Ensure Docker is running before running anyazd
provisioning / deployment commands.
Then, run the following commands to get the project on your local environment:
azd auth login
azd init -t azure-search-openai-demo-csharp
azd env new azure-search-openai-demo-csharp
For a detailed Deployment guide, see the Deployment Guide.
This method is reccommended for use at Knipper. Please request to be added to the rg-ai-knipper-docbot-csharp
resource group, or an identical one.
If you have existing resources in Azure that you wish to use, you can configure azd
to use those by setting the following azd
environment variables:
azd env set AZURE_OPENAI_SERVICE {Name of existing OpenAI service}
azd env set AZURE_OPENAI_RESOURCE_GROUP {Name of existing resource group that OpenAI service is provisioned to}
azd env set AZURE_OPENAI_CHATGPT_DEPLOYMENT {Name of existing ChatGPT deployment}
. Only needed if your ChatGPT deployment is not the default 'chat'.azd env set AZURE_OPENAI_EMBEDDING_DEPLOYMENT {Name of existing embedding model deployment}
. Only needed if your embedding model deployment is not the default embedding
.azd up
[!NOTE]
You can also use existing Search and Storage Accounts. See./infra/main.parameters.json
for list of environment variables to pass toazd env set
to configure those existing resources.
Important
Ensure Docker is running before running anyazd
provisioning / deployment commands.
Execute the following command, if you don't have any pre-existing Azure services and want to start from a fresh deployment.
Run azd up
- This will provision Azure resources and deploy this sample to those resources, including building the search index based on the files found in the ./data
folder.
Note
This application uses thegpt4o
model. When choosing which region to deploy to, make sure they're available in that region (i.e. EastUS). For more information, see the Azure OpenAI Service documentation.
After the application has been successfully deployed you will see a URL printed to the console. Click that URL to interact with the application in your browser.
[!NOTE]
It may take a few minutes for the application to be fully deployed.
[!IMPORTANT]
Ensure Docker is running before running anyazd
provisioning / deployment commands.
azd up
[!NOTE]
Make sure you have AZD supported bicep files in your repository and add an initial GitHub Actions Workflow file which can either be triggered manually (for initial deployment) or on code change (automatically re-deploying with the latest changes) To make your repository compatible with App Spaces, you need to make changes to your main bicep and main parameters file to allow AZD to deploy to an existing resource group with the appropriate tags.
"resourceGroupName": {
"value": "${AZURE_RESOURCE_GROUP}"
}
"tags": {
"value": "${AZURE_TAGS}"
}
param resourceGroupName string = ''
param tags string = ''
var baseTags = { 'azd-env-name': environmentName }
var updatedTags = union(empty(tags) ? {} : base64ToJson(tags), baseTags)
Make sure to use "updatedTags" when assigning "tags" to resource group created in your bicep file and update the other resources to use "baseTags" instead of "tags". For example -
```json
resource rg 'Microsoft.Resources/resourceGroups@2021-04-01' = {
name: !empty(resourceGroupName) ? resourceGroupName : '${abbrs.resourcesResourceGroups}${environmentName}'
location: location
tags: updatedTags
}
[!IMPORTANT]
Ensure Docker is running before running anyazd
provisioning / deployment commands. If you do not have docker, run with the .NET MAUI client.
azd auth login
AZURE_KEY_VAULT_ENDPOINT
. You can find the value in the .azure/YOUR-ENVIRONMENT-NAME/.env file or the Azure portal.Run the following .NET CLI command to start the ASP.NET Core Minimal API server (client host):
dotnet run --project ./app/backend/MinimalApi.csproj --urls=http://localhost:7181/
Navigate to http://localhost:7181, and test out the app.
This sample includes a .NET MAUI client, packaging the experience as an app that can run on a Windows/macOS desktop or on Android and iOS devices. The MAUI client here is implemented using Blazor hybrid, letting it share most code with the website frontend.
Open app/app-maui.sln to open the solution that includes the MAUI client
Edit app/maui-blazor/MauiProgram.cs, updating client.BaseAddress
with the URL for the backend.
If it's running in Azure, use the URL for the service backend from the steps above. If running locally, use http://localhost:7181.
Set MauiBlazor as the startup project and run the app
Run the following if you want to give someone else access to the deployed and existing environment.
azd init -t azure-search-openai-demo-csharp
azd env refresh -e {environment name}
- Note that they will need the azd environment name, subscription Id, and location to run this command - you can find those values in your ./azure/{env name}/.env
file. This will populate their azd environment's .env file with all the settings needed to run the app locally.pwsh ./scripts/roles.ps1
- This will assign all of the necessary roles to the user so they can run the app locally. If they do not have the necessary permission to create roles in the subscription, then you may need to run this script for them. Just be sure to set the AZURE_PRINCIPAL_ID
environment variable in the azd .env file or in the active shell to their Azure Id, which they can get with az account show
.Run azd down
The Document Bot is designed to work with PDF documents. When uploading documents, ensure they are in PDF format. The Document Bot will extract text from these documents to generate responses to user queries. Currently, no other format of documents is supported.
To upload documents to the Document Bot in mass, use the PrepareDocs startup item. This will allow you to upload a large quantity of documents at once, with varying categories.
This startup item runs a script that uploads all documents in the data
folder to the database.
Simply move all desired documents to the data
folder and run the PrepareDocs startup item.
If a document is already in the database, it will be skipped.
When running the PrepareDocs startup item, the script will create a new knowledge base index if one does not already exist. We do not recommend running the PrepareDocs startup item to generate the knowledge base due to settings customization. Once documents are uploaded to the knowledge base, some settings may not be changed. Below are examples of our knowledge base fields and semantic configuration.
When uploading documents, you can assign one or more categories to each document. These categories are used to filter the documents that the chatbot searches through when generating responses. To assign categories when uploading through the PrepareDocs startup item, sort the documents into folders named after the category you wish to assign them. The script will automatically assign the category based on the folder name. To add multiple categories to a document, you may use a nested folder structure. For example:
data
|-- Business Rules
| |-- Document1.pdf
| |-- Document2.pdf
|
|-- Knipper
|-- HR
| |-- Document3.pdf
| |-- Document4.pdf
|
|-- Document5.pdf
To delete documents from the database using the PrepareDocs startup item, navigate to the launchSettings.json
file in the Properties
folder of the PrepareDocs
project.
In launchSettings.json
, add the tag --removeall to the commandLineArgs
string.
The tag --removeall will delete all documents from the database, so use it with caution.
To delete specific documents, add the tag --remove and move the documents you wish to delete into the data
folder.
This fuctionality has NOT been tested and may not work as intended.
We have specific details in Prepdocs.md as well as some general information in Troubleshooting.md
All Backend API endpoints are prefixed with /api
. They are called by the frontend through ApiClient.cs
and are used to interact with the backend services. They are defined in WebApplicationExtensions.cs
.
[
{
"name":"2021 Remote Work Pilot (Update Nov 2021)-0.pdf",
"contentType":"application/pdf",
"size":87601,
"lastModified":"2024-07-23T13:50:57+00:00",
"url":"https://{storage_container_name}.blob.core.windows.net:443/content/2021%20Remote%20Work%20Pilot%20(Update%20Nov%202021)-0.pdf",
"status":0,
"embeddingType":0
},
{ ... }
]
category
is a list of categories, delimited by commas.
{
"files": [
{ ... },
{ ... }
],
"maxAllowedSize": 10,
"cookie": "example-csrf-token",
"category": "example-category",
"cancellationToken": "cancellation-token-placeholder"
}
{
"success": true,
"message": "Files uploaded successfully."
}
{
"messages":
[
{
"role":"user",
"content":"How often does the Quality Council meet?",
"isUser":true
},
{
"role":"assistant",
"content":"The Quality Council meets at least monthly.... [QA-021-1.pdf]",
"isUser":false
},
{
"role":"user",
"content":"Who are the members of the Quality Council?",
"isUser":true
}
],
"overrides":
{
"semantic_ranker":true,
"retrieval_mode":"Hybrid",
"semantic_captions":false,
"exclude_category":[],
"top":5,
"temperature":null,
"prompt_template":null,
"prompt_template_prefix":null,
"prompt_template_suffix":null,
"suggest_followup_questions":true,
"use_gpt4v":false,
"use_oid_security_filter":false,
"use_groups_security_filter":false,
"vector_fields":false},
"lastUserQuestion":"Who are the members of the Quality Council?",
"approach":0
}
}
- Example Response:
{ "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The Quality Council is a cross-functional team ... [QA-045-15.pdf]. Departments include ... [QA-021-2.pdf]. <<What are the main responsibilities of the Quality Council?>> <<...>> " }, "context": { "dataPointsContent": [ { "title": "QA-021-0.pdf", "content": "of this procedure is ..." }, { ... } ], "dataPointsImages": null, "followup_questions": [ "What are the main responsibilities of the Quality Council?", "..." ], "thoughts": [ { "title": "Thoughts", "description": "I utilized multiple sources ...", "props": null } ], "data_points": { "text": [ "QA-021-0.pdf: of this procedure is ...", "QA-021-1.pdf: Those in attendance ...", "foo.pdf: ..." ] }, "thoughtsString": "Thoughts: I utilized multiple sources ..." }, "citationBaseUrl": "https://{storage_container_name}.blob.core.windows.net/content", "content_filter_results": null } ] }
#### [url]/api/categories:
- **GET**: Retrieves a list of all categories given to documents.
- Example Response:
[ "Business Rules", "Client", "Knipper", ... ]
#### [url]/api/delete/blobs:
- **POST**: Deletes a document from the blob storage.
- Example Request:
{ "file":"2024 Holiday Schedule.pdf" }
#### [url]/api/delete/embeddings:
- **POST**: Deletes a document's embeddings from the knowledge base.
{ "file":"2024 Holiday Schedule.pdf" }
#### [url]/api/sourcefiles:
- **POST**: When given a list of document blob names, retrieves the respective links.
- Example Request:
{ "FileNames": [ "Current Handbook as of 3-12-2014-0.pdf", "Current Handbook as of 3-12-2014-1.pdf" ] }
- Example Response:
[ "https://{storage_container_name}.blob.core.windows.net/content/Current%20Handbook%20as%20of%203-12-2014-0.pdf", "https://{storage_container_name}.blob.core.windows.net/content/Current%20Handbook%20as%20of%203-12-2014-1.pdf" ]
## 5. Usage
### 5.1. Logging In
To ensure only authorized Knipper employees can use Document Bot, you must first log in with your Microsoft account. This is done by clicking the "Log In" button on the homepage and following the prompts to sign in with your Microsoft account.
### 5.2. Documents
On this page, you can view all documents in the database, search for specific documents, upload new documents, and delete existing documents. Documents are shown not as wholes, but as individual pages.
***Both Uploading and Deletion occur within the scope of the session. You may switch to other pages within the site, but do not close the tab until uploading or deletion is complete in order to prevent errors in the database.***
### 5.2.1. Uploading Documents
You can upload documents through the Documents page. Simply select the files you wish to upload, label them with one or more categories, and click "Upload".
Up to 10 documents can be uploaded at once. All the uploaded documents in the same batch will be labeled with the same categories.
This process may take a few minutes based on the number and size of documents being uploaded.
When labeling documents, you can search for existing categories through the Multi-Select Autocomplete. To create new categories, you can type in the category name and press enter.
### 5.2.2. Deleting Documents
To delete a document, click the trash icon next to the document you wish to delete. Despite the documents being displayed as pages, this action does not delete only that page.
Deletion will delete all pages of the document selected.
### 5.3. Chat
To chat, press the "New Chat" button to create a new instance of a chat. You can ask it questions about the documents and it will search the database for an answer.
If no relevant documents are found, the chat search the internet for an answer.
Each chat instance maintains its own chat history and can be accessed by clicking on the chat instance in the chat list.
Chats will only take into account the chat history of the specific instance they are part of when generating responses.
You can exclude categories of documents from your chat responses by searching them in the Multi-Select Autocomplete.
### 5.4. Profile
The profile page displays your Microsoft account information. This page is not currently used for any functionality, but may be used in the future for account management.
You may find your auth token to be used in the API here, as well as other information for debugging purposes.