How to delete a document from Search index? Any suggestions how to do it

microsoft / sample-app-aoai-chatGPT

Sample code for a simple web chat experience through Azure OpenAI, including Azure OpenAI On Your Data.

MIT License

1.45k stars 2.16k forks source link

How to delete a document from Search index? Any suggestions how to do it #795

Closed rehat22 closed 1 week ago

rehat22 commented 2 months ago

I used the following script to delete a dcument from azure ai search:

import requests service_name = "" index_name = "" api_version = "2023-07-01-preview" admin_api_key = "" document_key = ""

url = f"https://{service_name}.search.windows.net/indexes/{index_name}/docs/index?api-version={api_version}"

data = { "value": [{ "id": document_key, "@search.action": "delete" }] }

headers = { "Content-type": "application/json", "api-key": admin_api_key }

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200 or response.status_code == 204: print("Document deleted successfully!") else: print(f"Error deleting document: {response.status_code} {response.text}").

I got a success message on running it but the document is still there in the index. Is there an another way to approach this?

guyyardeni commented 2 months ago

The chat interaction uses the chunk index as well, so you'd have to clear the chunks to get the data out of the index. I'm not sure if there is a way to identify which chunks are associated with a specific file so I delete all of the chunk and reset the indexers so they create new chunks and index them.

rehat22 commented 2 months ago

thank you @guyyardeni

jack-vinitsky commented 4 weeks ago

The chat interaction uses the chunk index as well, so you'd have to clear the chunks to get the data out of the index. I'm not sure if there is a way to identify which chunks are associated with a specific file so I delete all of the chunk and reset the indexers so they create new chunks and index them.

You should be able to use the parent_id field located in the chunk index. This is the id of the original document. All chunks derived from a given a document will have the same value for this field.

jack-vinitsky commented 2 weeks ago

The chat interaction uses the chunk index as well, so you'd have to clear the chunks to get the data out of the index. I'm not sure if there is a way to identify which chunks are associated with a specific file so I delete all of the chunk and reset the indexers so they create new chunks and index them.

You should be able to use the parent_id field located in the chunk index. This is the id of the original document. All chunks derived from a given a document will have the same value for this field.

@rehat22 If this answer resolves your issue, please mark it as closed.

rehat22 commented 2 weeks ago

Hi, You can do it. Thank you

jack-vinitsky commented 1 week ago

Hi, You can do it. Thank you

I think only the person who opened the issue or an admin can do it.