microsoft / azurechat

🤖 💼 Azure Chat Solution Accelerator powered by Azure Open AI Service
MIT License
1.19k stars 1.12k forks source link

Azure AI Search extension not working #337

Open surendransuri opened 7 months ago

surendransuri commented 7 months ago

Hi, I'm trying to set up Azure AI search Extension to fetch the content from the index but the tool output it is not fetching any content from the index.

image

image

When I search for this question in azure AI search I am see able to see the results there.

image

I couldn't find whether the issue is.

For vector field in header section of extension creation I have provided only the vector field from the index.

Also, the index is created in separate resource group of Azure and this App is running in separate group is this issue is because of this? If it is then what all roles I need to assign for this search service

tvonment commented 7 months ago

i had the same problem and solved it with a quick fix. but i think someone should have a closer look at it. the problem is, that my search had the content in a different field as pageContent so my content got lost in the FormatCitations Function.

when i update the FormatCitations Function it works for me: (citation-service.ts)

export const FormatCitations = (citation: any[]) => {
  const withoutEmbedding: DocumentSearchResponse[] = [];
  citation.forEach((d) => {
    withoutEmbedding.push({
      score: d.score,
      document: {
        metadata: d.document.metadata,
        pageContent: d.document.pageContent || d.document.content || d.document.chunk,
        chatThreadId: d.document.chatThreadId,
        id: "",
        user: "",
      },
    });
  });

  return withoutEmbedding;
};

pageContent: d.document.pageContent || d.document.content || d.document.chunk this should include your possible fields with the content.

or just make sure that the content in your Azure Search is in the Field pageContent.

schraepf commented 7 months ago

I'm having the same issue trying to connect to the index I've created using integrated vectorization. The wizard and python examples create the same index format, and when I connect the extension I get exactly the results in OP's screenshot.

I'm looking for an approach to use integrated vectorization via Indexer Skillsets to maintain indexes for this solution.

schraepf commented 7 months ago

Solved - modified the index and the output mappings of my skillset

vlad-tsoy commented 6 months ago

Solved - modified the index and the output mappings of my skillset

I'm also trying to use integrated vectorization, how did you modify the index?

schraepf commented 6 months ago

@vlad-tsoy - I'm creating my index with these fields:

fields = [
    SearchField(name="id", type=SearchFieldDataType.String, key=True, filterable=True, sortable=False, facetable=False, analyzer_name="keyword"),
    #SearchField(name="user", type=SearchFieldDataType.String, sortable=False, filterable=True, facetable=False), #used for filtering
    #SearchField(name="chatThreadId", type=SearchFieldDataType.String, sortable=False, filterable=True, facetable=False), #used for filtering
    SearchField(name="pageContent", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),
    SearchField(name="metadata", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),
    SearchField(name="embedding", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=1536, vector_search_profile_name="myHnswProfile"),
    SearchField(name="parent_id", type=SearchFieldDataType.String, sortable=True, filterable=True, facetable=True)
]

And configured my skillset output mappings to match:

           mappings=[  
                InputFieldMappingEntry(name="pageContent", source="/document/pages/*"),  
                InputFieldMappingEntry(name="embedding", source="/document/pages/*/vector"),  
                InputFieldMappingEntry(name="metadata", source="/document/metadata_storage_name")  
            ],  
vlad-tsoy commented 6 months ago

@vlad-tsoy - I'm creating my index with these fields:

fields = [
    SearchField(name="id", type=SearchFieldDataType.String, key=True, filterable=True, sortable=False, facetable=False, analyzer_name="keyword"),
    #SearchField(name="user", type=SearchFieldDataType.String, sortable=False, filterable=True, facetable=False), #used for filtering
    #SearchField(name="chatThreadId", type=SearchFieldDataType.String, sortable=False, filterable=True, facetable=False), #used for filtering
    SearchField(name="pageContent", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),
    SearchField(name="metadata", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),
    SearchField(name="embedding", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=1536, vector_search_profile_name="myHnswProfile"),
    SearchField(name="parent_id", type=SearchFieldDataType.String, sortable=True, filterable=True, facetable=True)
]

And configured my skillset output mappings to match:

          mappings=[  
               InputFieldMappingEntry(name="pageContent", source="/document/pages/*"),  
               InputFieldMappingEntry(name="embedding", source="/document/pages/*/vector"),  
               InputFieldMappingEntry(name="metadata", source="/document/metadata_storage_name")  
           ],  

Thank you, Michael!

bwitzig-zen commented 3 months ago

Any tips or guides on how to setup the index in regards to pulling unstructured data stored in a Azure blob storage? I've tried to build an index, but having issues where the indexer doesn't actually pull any data when run with the suggested mappings.

edit: I had to adjust some additional indexer settings as well as ensure files to be indexed are in a folder and not in the root of a blob container

WEMcJJJ commented 2 weeks ago

@bwitzig-zen, can you share the steps for how you got your Azure blob storage to work? I've got some docs in a folder, but however I set up an index it never seems to work correctly. Thanks!

WEMcJJJ commented 2 weeks ago

For anyone else having issues, what fixed it for me was adding some field mappings to the search indexer. The app is looking for "content", "metadata" and "id", so, for my instance (using Blob Storage), I added the following field mappings to the indexer and it seemed to fix it without having to modify any code (these were added undier the "fieldMappings" section of my indexer - if you don't have one you'll have to add it):

     {
      "sourceFieldName": "content",
      "targetFieldName": "PageContent",
      "mappingFunction": null
     },
     {
      "sourceFieldName": "metadata_storage_name",
      "targetFieldName": "metadata",
      "mappingFunction": null
     },
     {
      "sourceFieldName": "metadata_storage_path",
      "targetFieldName": "id",
      "mappingFunction": null
     }
Theocomsoft commented 2 weeks ago

i had the same problem and solved it with a quick fix. but i think someone should have a closer look at it. the problem is, that my search had the content in a different field as pageContent so my content got lost in the FormatCitations Function.

when i update the FormatCitations Function it works for me: (citation-service.ts)

export const FormatCitations = (citation: any[]) => {
  const withoutEmbedding: DocumentSearchResponse[] = [];
  citation.forEach((d) => {
    withoutEmbedding.push({
      score: d.score,
      document: {
        metadata: d.document.metadata,
        pageContent: d.document.pageContent || d.document.content || d.document.chunk,
        chatThreadId: d.document.chatThreadId,
        id: "",
        user: "",
      },
    });
  });

  return withoutEmbedding;
};

pageContent: d.document.pageContent || d.document.content || d.document.chunk this should include your possible fields with the content.

or just make sure that the content in your Azure Search is in the Field pageContent.

Hello,

I am creating my index with "vectorize and import data" button in azure ai serach and then my content fiels is automatically named "tex-tvector" how can i do to manually create a vectorized index that would work with azure chat or how can i change the code in azure chat for it to recognize my vector field with content ?

bwitzig-zen commented 1 week ago

You select the vector field when adding the datasource into azurechat (that one is the easiest) .

In regards to the other fields, you will need to manually create the index/indexer/skillset etc. as the fields in the vectorize and import data are not flexible to my knowledge.

Creating manually matching the schema in the default index should get you up and running.

Theocomsoft commented 5 days ago

ok, roger that. An example or some documentation links would be nice to set up index/indexer/skillset.