microsoft / service-fabric

Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
https://docs.microsoft.com/en-us/azure/service-fabric/
MIT License
3.02k stars 399 forks source link

Issue with automatic backup and restores #505

Open RalfKornmannEnvision opened 4 years ago

RalfKornmannEnvision commented 4 years ago

We liked to use the integrated Backup and Restore feature instead of running our own. But something seems not to work as expected.

Expected Behavior

Show a backup history like in the tutorial: https://docs.microsoft.com/de-de/azure/service-fabric/service-fabric-backuprestoreservice-quickstart-azurecluster ; https://docs.microsoft.com/de-de/azure/service-fabric/media/service-fabric-backuprestoreservice/backup-enumeration.png

Current Behavior

Backups are created and stored but all of the have Data Loss Version, Configuration Version and Lsn of LastBackup Record of -1. Clicking on the BackupId shows a JSON with {"Error":{"Code":"NotFound", "Message":"Null"}}. Trying to restore one of this backups shows an error message that the BaxckupId is missing even if it has been entered.

Steps to Reproduce

  1. Create an storage account with default settings
  2. follow https://docs.microsoft.com/de-de/azure/service-fabric/service-fabric-backuprestoreservice-quickstart-azurecluster using the azure portal and service fabric explorer
  3. publish an application with a simple actor service
  4. enable backups for this application

Service Fabric Runtime and SDK Version : 6.5.676.9590

Operating System : Windows Server

Cluster Size : 5 on azure. created with service fabric template

raunakpandya commented 4 years ago

Please share your cluster resource ID on Azure and the region.

RalfKornmannEnvision commented 4 years ago

Thank you for your quick response.

The Cluster Id is d3e4fcfe-6408-4268-a96e-57e839b31619 in westeurope. It's a test cluster that we don't use in production. Therefore I don't care if you break something while checking this issue.

raunakpandya commented 4 years ago

Please share me the cluster resource id, which would be in the format subscriptions//resourcegroups//Microsoft.ServiceFabric/.....

RalfKornmannEnvision commented 4 years ago

I am sorry I misunderstood which one you need

/subscriptions/5ad63184-4e72-4cfd-9c24-e9ad708c1dc9/resourcegroups/tsr_backuptest/providers/Microsoft.ServiceFabric/clusters/tsrbackuptestcluster

Is this the correct one?

raunakpandya commented 4 years ago

Yes. BTW what do you mean by Clicking on the backupid shows a json with? Where did you find Data Loss Version, Configuration Version and Lsn of LastBackup Record of -1? In the output of backup enumeration? Do you see backups taken properly in the storage account?

RalfKornmannEnvision commented 4 years ago

In the service fabric explorer in the backups tap. I have 5 backups for each partition. But each of them has only -1 for the versions and I get the JSON when i click on the backupid of any of them

RalfKornmannEnvision commented 4 years ago

Any updates?

khandelwalbrijesh commented 4 years ago

@RalfKornmannEnvision have you deleted 'tsrbackuptestcluster'. I could not find it in SFRP explorer of westeurope.

RalfKornmannEnvision commented 4 years ago

@RalfKornmannEnvision have you deleted 'tsrbackuptestcluster'. I could not find it in SFRP explorer of westeurope.

Seems like operations have removed it as it was not used and just costing money. But the issue was not limited to this cluster anyway. If you follow the steps in the first post you should able to reproduce it.

khandelwalbrijesh commented 4 years ago

Okay, We will try to reproduce it at our end but since the cluster is deleted therefore we cannot make any promises as to when we will be able to take this up. But we will update you as soon as we get the chance to follow these steps on our side. Just to confirm more on this, you are only observing this only for actor service here. Right?

RalfKornmannEnvision commented 4 years ago

Okay, We will try to reproduce it at our end but since the cluster is deleted therefore we cannot make any promises as to when we will be able to take this up. But we will update you as soon as we get the chance to follow these steps on our side. Just to confirm more on this, you are only observing this only for actor service here. Right?

Our application contains only state full actors and stateless services. Therefore I don't know if a state full service would be having this issue, too.