microsoft / azurechat

🤖 💼 Azure Chat Solution Accelerator powered by Azure Open AI Service
MIT License
1.14k stars 954 forks source link

Unchecked growth in Search Service due to documents not being deleted when CosmosDB TTL expires #303

Open jasonpnnl opened 5 months ago

jasonpnnl commented 5 months ago

Description We've encountered an issue within the azurechat application related to document indexing and deletion. Although the application is designed to upload documents to Azure Search Service, it seems that deletion of documents is not handled properly in all scenarios. Problem When users upload documents to chat in azurechat, these documents are indexed in Azure Search Service. When chat threads are deleted via the UI, the deleteDocuments function within the azure-cog-vector-store.ts correctly removes the corresponding documents from the Search Service index. However, if the thread is automatically removed once the time-to-live (TTL) period for the Cosmos DB history table is reached, the associated documents remain in the search index. This is because the deleteDocuments function is not triggered when the TTL mechanism deletes the thread. Consequence This leads to unchecked growth in the Search Service as orphaned documents accumulate, which becomes problematic over time. This incurs unnecessary costs, given that the Search Service is a premium component of this application. Potential Solutions Implement a background process that checks for documents associated with threads that are close to their TTL and explicitly calls deleteDocuments before TTL expiration. Reevaluate the TTL strategy by possibly introducing a soft-delete mechanism where records are flagged as deleted and then purged systematically by a cleanup service. Request We need to design and implement a solution that ensures documents are consistently deleted from the Search Service index when their corresponding threads are no longer present. It's critical for managing expenses associated with the Search Service.

sonphnt commented 5 months ago

Yes, I have the same issue. We should have a background job that will check and clean up in Azure Search index

sonphnt commented 4 months ago

If you set TTL on CosmosDB - History container then in version 2, Extensions and Personas configuration will be deleted automatically as well.