microsoft / service-fabric

Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
https://docs.microsoft.com/en-us/azure/service-fabric/
MIT License
3.02k stars 399 forks source link

Restart-ServiceFabricDeployedCodePackage generates errors in the console for stateless singleton applications #534

Open alxoldman opened 4 years ago

alxoldman commented 4 years ago

Hi Our project has a Service Fabric cluster with several apps hosted inside it. Some of the apps are stateful and are running on several partitions, other apps are stateless singletons. From time to time we have to restart the applications. To do this remotely, we chose PowerShell. Restart-ServiceFabricDeployedCodePackage command-let works perfectly for the multi-partition applications. I have been using it in the following way: Restart-ServiceFabricDeployedCodePackage -ApplicationName <AppName> -ServiceName <ServiceName> -PartitionId <PartitionId> -CommandCompletionMode Verify But for the single-partition apps this command-let was generating an error in the PowerShell console:

Restart-ServiceFabricDeployedCodePackage : Did not find deployed code package for fabric:/CustomAp:Code on node NodeName1

I've tried to extend the number of command-let parameters to restart deployed code packages one by one. Eventually, the command-let invocation became the following: Restart-ServiceFabricDeployedCodePackage -NodeName <NodeName> -ApplicationName <AppName> -ServiceManifestName <ServiceManifestName> -CodePackageName <CodePackageName> -ServicePackageActivationId <ServicePackageActivationId> -CommandCompletionMode Verify But even after that, I see the error from time to time. It occurs not in 100% of command-let invocations, but if the restarted app has three instances (and, correspondingly, three code packages), at least one of the invocations generates the error. The interesting thing is that, in fact, the apps are restarted regardless of existance of the error. The apps are restarted even when I tried to restart the whole partition without referring to an exact code package. Can anybody suggest how to remove these errors? I found an already closed ticket - https://github.com/Azure/service-fabric-issues/issues/1106 It contains no solution, so I still don't know what to do. If it helps, all the apps have "." in their names and are ExclusiveProcess.

alxoldman commented 4 years ago

Hello everybody! The issue is still valid for me. I will be appreciated for any help in its solving.

LarsKemmann commented 4 years ago

I have the same problem and have tried the same workarounds as you, with no success.

LarsKemmann commented 4 years ago

There's also this SO post with the same issue and the same weird behavior being described.

dhruvmodi13 commented 4 years ago

i am also getting the same error, even though, the code package is restarted.

alxoldman commented 4 years ago

Seems, the issue is not unique. But still no comments or solution from Microsoft :(

masnider commented 4 years ago

At least as it was reported in the github issue the the error described in the solution is a benign/expected race condition. I bet you don't see it if you omit -Verify.

As to whether we can do better and whether all those parameters should be necessary, adding a few folks.

1106 was a legit bug where certain package names were not handled correctly.

alxoldman commented 4 years ago

Thanks for your help, @masnider Regarding -Verify flag... I would rather prefer not to remove -CommandCompletionMode parameter. I don't want to restart a code package without confidence that the previous one is already launched.

dmytro-gokun commented 3 years ago

@gkhanna79 A year and 3 months later the issue is still there. Is that such a huge deal to fix it?

dvankurentxp commented 1 year ago

I am also running into this issue. At this point it seems like the only way to restart a service is to start downing nodes one-by-one or remove/add/upgrade. That is a bit of an issue on a production cluster.

mrpatil08 commented 9 months ago

I still see the issue in my environment, did anyone found the solution ?

mfmadsen commented 9 months ago

What is the reproduction scenario? We have used the Restart-ServiceFabricDeployedCodePackage cmdlet quite often over the years and have never experienced this issue.

mrpatil08 commented 9 months ago

I am trying to restart API node by node below is the scenario : ( We are using an On-prem Service Fabric Cluster )

PS C:\WINDOWS\system32> # Set variables for application and code package $nodeName = "vm0" $applicationName = "fabric:/API.Configuration" $codePackageName = "Code" $serviceManifestName = "API.ConfigurationAPIPkg"

Get-ServiceFabricDeployedServicePackage -NodeName "vm0" -ApplicationName fabric:/API.Configuration -ServiceManifestName "API.ConfigurationAPIPkg"

Get-ServiceFabricNode

Get-ServiceFabricDeployedCodePackage -NodeName $nodeName -ApplicationName $applicationName

Get the service fabric application

$application = Get-ServiceFabricApplication -ApplicationName $applicationName

Check if the application exists

if ($application -eq $null) { Write-Host "Application not found: $applicationName" } else {

Restart the specified code package on the specified node

Restart-ServiceFabricDeployedCodePackage -NodeName $nodeName -ApplicationName $applicationName -CodePackageName $codePackageName -ServiceManifestName $serviceManifestName -CommandCompletionMode Verify

Write-Host "Restarting code package $codePackageName on node $nodeName for application $applicationName"

}


Output:

Restart-ServiceFabricDeployedCodePackage : Did not find deployed code package for fabric:/API.Configuration:Code on node vm0 At line:22 char:5


Below is the code package details :

PS C:\WINDOWS\system32> Get-ServiceFabricDeployedCodePackage -NodeName $nodeName -ApplicationName $applicationName

CodePackageName : Code CodePackageVersion : 1.0.0 ServiceManifestName : API.ConfigurationAPIPkg ServicePackageActivationId : 71ad20d7-40db-4001-a050-e3fa2b28bb77 HostType : ExeHost HostIsolationMode : None DeployedCodePackageStatus : Active RunFrequencyInterval : 0 EntryPoint : EntryPointStatus : Started CodePackageInstanceId : 133471161872726796 EntryPointLocation : C:\FabCluster\ProgramData\SF\vm0\Fabric\work\Applications\API.ConfigurationType_App1633\API.ConfigurationAPIPkg.Code.1.0 .0\API.ConfigurationAPI.exe ProcessId : 28096 ContainerId : RunAsUserName : DomainGMSA ActivationCount : 1 ActivationFailureCount : 0 ContinuousActivationFailureCount : 0 ContinuousExitFailureCount : 0 ExitCount : 0 ExitFailureCount : 0 LastActivationUtc : 12/15/2023 3:11:40 PM LastExitCode : 0 LastExitUtc : 1/1/0001 12:00:00 AM LastSuccessfulActivationUtc : 12/15/2023 3:11:40 PM LastSuccessfulExitUtc : 1/1/0001 12:00:00 AM SetupEntryPoint : CodePackageUsageStatistics :

mfmadsen commented 9 months ago

Even with -CommandCompletionMode Verify this works fine for us - but we are not on on-premise cluster, but on an Azure hosted cluster.