[Bug]: When using custom Azure DataSource, the citations are not returned with the response

dannygar commented 6 days ago

Language

Javascript/Typescript

Version

latest

Description

When using data_sources augmentation with Azure AI Search, that custom data source is getting removed from the prompt's data sources, thus causing the citations not being returned from the chat completion API call.

The reason is that this data source gets removed from the prompts immediately after the renderData() is executed.

The workaround: Add the second data source through the data_sources custom property of the completion property in the prompt's config. A sample of such workaround config.json:

{
    "schema": 1.1,
    "description": "Chat with Teams RAG",
    "type": "completion",
    "completion": {
      "completion_type": "chat",
      "include_history": true,
      "include_input": true,
      "max_input_tokens": 2800,
      "max_tokens": 1000,
      "temperature": 0.9,
      "top_p": 0.0,
      "presence_penalty": 0.6,
      "frequency_penalty": 0.0,
      "stop_sequences": [],
      "data_sources": [
        {
            "type": "azure_search",
            "parameters": {
              "index_name": "sample-documents-vector",
              "semantic_configuration": "default",
              "query_type": "vector_simple_hybrid",
              "fields_mapping": {
                "content_fields_separator": "\n",
                "content_fields": [
                  "content"
                ],
                "filepath_field": "filepath",
                "title_field": "title",
                "url_field": "url",
                "vector_fields": [
                  "contentVector"
                ]
              },
              "in_scope": false,
              "role_information": "You are an AI assistant that helps people find information.",
              "filter": null,
              "strictness": 5,
              "top_n_documents": 10,
              "embedding_dependency": {
                "type": "deployment_name",
                "deployment_name": "text-embedding-ada"
              },
              "authentication": {
                "type": "api_key",
                "key": "${AZURE_SEARCH_KEY}"
              }
            }
        }      
      ]      
    },
    "augmentation": {
        "augmentation_type": "none",
        "data_sources": {
          "azure-ai-search": 2500
        }        
    }
  }

Reproduction Steps

1.Add the `data_sources` augmentation:

    "augmentation": {
        "augmentation_type": "none",
        "data_sources": {
          "azure-ai-search": 2500
        }        
    }

2.Add a custom Azure AI Search data source to the prompt:
```typescript
    // Add the Azure OpenAI Embeddings data source to the prompt
    this.planner.prompts.addDataSource(
      new AzureAISearchDataSource({
        name: this.env.data.AZURE_SEARCH_SOURCE_NAME,
        indexName: this.env.data.AZURE_SEARCH_INDEX_NAME,
        azureAISearchApiKey: this.env.data.AZURE_SEARCH_KEY,
        azureAISearchEndpoint: this.env.data.AZURE_SEARCH_ENDPOINT,
        azureOpenAIApiKey: this.env.data.OPENAI_KEY,
        azureOpenAIEndpoint: this.env.data.OPENAI_ENDPOINT,
        azureOpenAIEmbeddingDeployment: this.env.data.OPENAI_EMBEDDING_MODEL,
      })
    );

Execute the prompt. The citations are not returned with the response. ...

corinagum commented 4 days ago

@aacebo could we discuss the fix for this in our next team meeting?

corinagum commented 4 days ago

@dannygar for reference, could you provide a version of the raw data from Azure AI Search, and if there are relevant places where the data still exists before it gets removed?

dannygar commented 4 days ago

@corinagum, per your request, here is the request object sent to Azure OpenAI API after the embeddings have already been added to the user's prompt:

And the response res object looks like the following:

Note that it doesn't contain the context object within, which usually holding the citations generated by LLM.

Here is the resulting message sent to the user:

In comparison, below are the same objects after I've added a custom data source (second) after the one used to extract the embeddings from the Azure AI Search:

request:

response:

And the resulting message sent to the user is:

corinagum commented 3 days ago

Hi @dannygar, thanks for the extra info! As it turns out, it's recommended you update your sample to use server-side datasources instead of implementing client side. It will lead to less code, which is a plus. Using data_source in our library is not deprecated, but there is more overhead and is unneeded for Azure AI search.

The sample you likely used was this: https://github.com/microsoft/teams-ai/tree/main/js/samples/04.ai-apps/g.datasource-azureAISearch

The sample we recommend looking at to implement is: https://github.com/microsoft/teams-ai/blob/main/js/samples/04.ai-apps/h.datasource-azureOpenAI

Some notes:

Citations won't work with client side data_source without mods like what you've described above.
You can update the config.ts data_source (see config.ts section below)
You will no longer need your ___DataSource.ts file
You do not need to set up managed identity like the On Your Data sample does; this is just a requirement we had that one of our samples uses managed identity. Code you don't need
Powered by AI (citations) info - adding this link just in case it's helpful.
Please note the env folder

`config.ts`

client secret example for data_source

"data_sources": [
            {
                "type": "azure_search",
                "parameters": {
                    "endpoint": "",
                    "authentication": {
                        "type": "api_key",
                        "key": ""
                    }
                }
            }
        ]

Managed identity example (using secrets in ts)

            (prompt.config.completion as any).data_sources = [{
                type: 'azure_search',
                parameters: {
                    endpoint: process.env.AZURE_SEARCH_ENDPOINT,
                    index_name: process.env.AZURE_SEARCH_INDEX,
                    // other settings
                   // ...
                    authentication: {
                        type: 'system_assigned_managed_identity' // or 'api_key'
                    }
                }
            }];
        }

FYI I already filed a new issue to update the AI Search sample to match the On Your Data sample implementation

microsoft / teams-ai