microsoft / PubSec-Info-Assistant

Information Assistant, built with Azure OpenAI Service, Industry Accelerator
MIT License
332 stars 723 forks source link

Data Deletion with Version 1.2 Deployment issue #876

Open ravikhunt opened 1 week ago

ravikhunt commented 1 week ago

I would like to report an issue that occurred following the deployment of Version 1.2.

Upon completing the deployment, we noticed that:

All previously uploaded files were deleted from the Blob storage. including content and uploaded container All existing records were deleted from the Cosmos DB.

This data loss was not intended and has impacted the availability of previously uploaded files and records. Please investigate the cause of this issue and suggest corrective actions to prevent it from happening in future deployments.

If any recovery options are available for the deleted files and records, please let me know what the next steps.

I am looking forward to your prompt response.

ravikhunt commented 1 week ago

Any comments on this??. Thanks

bjakems commented 1 week ago

Hi Ravi,

I am currently looking into this issue. Which version did you upgrade from? When you deployed the upgrade, did you have your prior TF state files from your previous deployment?

ravikhunt commented 1 week ago

bjakems I have upgraded from v1.1.1, Yes when I upgraded prior TF state files was there

bjakems commented 1 week ago

Please review your TF plan file and see if the Storage Account and Cosmos DB were marked for deletion and recreation.

In regards to data recovery, since the entire Storage Account and Cosmos DB were deleted, they are not recoverable via the portal. I advise submitting an MS ticket to see if there's anything they can do if the data is not easily replaced.

ravikhunt commented 6 days ago

Issue Replication Scenario:

1.Current Version (V.1.1.1):

2.Upgrade to V1.2:

3.Post-Upgrade to V1.2:

Another Scenario:

  1. After V1.2 Upgrade:

    • Some files were uploaded successfully, or they are still stuck in the "Uploaded" or "Queue" state, and corresponding DB entries exist.
  2. Downgrade Attempt to V1.1.1:

    • Attempting to downgrade back to V1.1.1 faces issues and doesn't succeed. The TF files are then updated back to V1.1.1.
  3. Upgrade Again to V1.2:

    • After resolving the downgrade issue, upgrading again to V1.2 results in a successful deployment.
  4. Same Issue:

    • Despite the successful deployment, the same issue occurs: all previously uploaded files (whether in "Uploaded" or "Queue" status) are wiped out, and both the storage account and Cosmos DB are cleared out again.

This sequence of events highlights that upgrading/downgrading between versions, especially with changes in Terraform files, can lead to unintended deletion of storage and database data, causing file uploads to get stuck or wiped out.

bjakems commented 6 days ago

Thanks Ravi. Unfortunately, there is no supported upgrade path to 1.2.

ravikhunt commented 6 days ago

All right, might that not at this moment big issue because not working on directly production environment, but as issue arrives in Development time so same could be possible for the Production as well and thats why want to know what exactly it’s causing, if somehow we can know issue is occurring because of upgradation from V1.1.1 to V1.2 with terraform changes

bjakems commented 6 days ago

The terraform plan file will let you know what has been marked for deletion and recreation. You can inspect that file to see what has changed and the proposed action from Terraform.