Open jasonpnnl opened 10 months ago
Yes, I have the same issue. We should have a background job that will check and clean up in Azure Search index
If you set TTL on CosmosDB - History container then in version 2, Extensions and Personas configuration will be deleted automatically as well.
Anyone have a quick solution to this issue? I want to set TTL on my history container. First I will move personas and extensions to config. Then I will need to have a process of removing the documents from the index near TTL expiration for history.
Description We've encountered an issue within the azurechat application related to document indexing and deletion. Although the application is designed to upload documents to Azure Search Service, it seems that deletion of documents is not handled properly in all scenarios. Problem When users upload documents to chat in azurechat, these documents are indexed in Azure Search Service. When chat threads are deleted via the UI, the deleteDocuments function within the azure-cog-vector-store.ts correctly removes the corresponding documents from the Search Service index. However, if the thread is automatically removed once the time-to-live (TTL) period for the Cosmos DB history table is reached, the associated documents remain in the search index. This is because the deleteDocuments function is not triggered when the TTL mechanism deletes the thread. Consequence This leads to unchecked growth in the Search Service as orphaned documents accumulate, which becomes problematic over time. This incurs unnecessary costs, given that the Search Service is a premium component of this application. Potential Solutions Implement a background process that checks for documents associated with threads that are close to their TTL and explicitly calls deleteDocuments before TTL expiration. Reevaluate the TTL strategy by possibly introducing a soft-delete mechanism where records are flagged as deleted and then purged systematically by a cleanup service. Request We need to design and implement a solution that ensures documents are consistently deleted from the Search Service index when their corresponding threads are no longer present. It's critical for managing expenses associated with the Search Service.