microsoft / service-fabric

Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
https://docs.microsoft.com/en-us/azure/service-fabric/
MIT License
3.03k stars 401 forks source link

Unable to add node to a standalone cluster #474

Closed renekroll closed 3 years ago

renekroll commented 4 years ago

I have trouble with adding a new node to an existing Service Fabric cluster with runtime package version: 7.1.409.9590. I also tried it with runtime version 7.0.470.9590 but same result

I followed the instructions for adding a node to a Service Fabric standalone cluster.

Add nodes to clusters configured with Windows Security using gMSA https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-windows-server-add-remove-nodes

Steps to reproduce

The cluster configuration upgrade results in the following error:

Start-ServiceFabricClusterConfigurationUpgrade : System.Runtime.InteropServices.COMException (-2147017627)
AggregateException: One or more errors occurred.

Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath D: ...
     CategoryInfo          : InvalidOperation: (Microsoft.Servi...usterConnection:ClusterConnection) [Start-ServiceFa...gurationUpgrade], FabricException
     FullyQualifiedErrorId : StartClusterConfigurationUpgradeErrorId,Microsoft.ServiceFabric.Powershell.StartClusterConfigurationUpgrade

When I add the node with the ServiceFabric Powershell cmdlet Add-ServiceFabricNode the new node can be added but is not listed in the ClusterManifest.xml in SF explorer on any node. I tried to run a cluster configuration upgrade with Start-ServiceFabricClusterConfigurationUpgrade after adding the node but this results also in an error.

The pre conditions for the new node (certificates, permissions,...) fulfilled. When I create the cluster from scratch with the new node everything is fine.

I'm working on a strategy to migrate the servers of our Service Fabric production environment and a recreation with downtime is no option.

maburlik commented 4 years ago

When using gMSA the AddNode.ps1 script should not be used to add nodes since there is a bug blocking this for 7.1 and earlier which is fixed in planned release 7.2. You may have to remove this node and re-add it via config upgrade.

To determine the reason for the exception you expand the details by running:

try {(Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath .\config2.json )} catch { Write-Host $_.Exception.ToString(); }
vdvarlamov commented 4 years ago

@maburlik, I faced the same problem! Cluster version - 7.1.417.9590

System.Fabric.FabricException: System.Runtime.InteropServices.COMException (-2147017627)
AggregateException: One or more errors occurred.
 ---> System.Runtime.InteropServices.COMException: Exception from HRESULT: 0x80071C65
   at System.Fabric.Interop.NativeClient.IFabricClusterManagementClient11.EndUpgradeConfiguration(IFabricAsyncOperationContext context)
   at System.Fabric.Interop.Utility.<>c__DisplayClass22_0.<WrapNativeAsyncInvoke>b__0(IFabricAsyncOperationContext context)
   at System.Fabric.Interop.AsyncCallOutAdapter2`1.Finish(IFabricAsyncOperationContext context, Boolean expectedCompletedSynchronously)
   --- End of inner exception stack trace ---
   at System.Management.Automation.MshCommandRuntime.ThrowTerminatingError(ErrorRecord errorRecord)
vdvarlamov commented 4 years ago

@maburlik , On cluster 7.2.413.9590, the problem persists! Can you answer how to add a node correctly?

maburlik commented 3 years ago

On 7.2 start by checking if this provides any detail: Get-ServiceFabricClusterUpgrade Get-ServiceFabricClusterConfigurationUpgradeStatus

If the second call doesn't list an explicit error we may need the traces from your FabricLogRoot Traces directory to gain a better insight.

craftyhouse commented 3 years ago

Closing for now since it's stale, but please re-open with the requested info above so we can address/backlog as necessary.