microsoft / service-fabric

Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
https://docs.microsoft.com/en-us/azure/service-fabric/
MIT License
3.03k stars 401 forks source link

Add-AzureRmServiceFabricNodeType interrupted during execution left cluster+nodetype in faulty state #898

Open plillevold opened 7 years ago

plillevold commented 7 years ago

Have been trying, to no avail, to fix/remove a node type that seems to be "in limbo". I used the Add-AzureRmServiceFabricNodeType to create a new node type (called "compute") in our production cluster. During execution of the command I lost connectivity on my network. The command did not fully complete. Now I'm left with a cluster that have the nodetype (I can see it in the Azure portal, as well as in the output from Get-AzureRmServiceFabricCluster) but the node type have no corresponding VM scaleset. In the cluster, the infrastructure service fabric:/System/InfrastructureService/compute is created, but is naturally failing since it is required to run on nodes that match NodeTypeName==compute. Of which there is none.

I have tried to remove the faulty node type using Remove-AzureRmServiceFabricNodeType but the command fail reporting that the specified node type does not exist.

Note: the cluster is Silver-level (with Bronze durability) running on fabric version 5.6.210.9494.

Any advice on how to fix this issue would be greatly appreciated.

I should also note that, in the current state, commands that update the cluster configuration, like adding a admin certificate, also fail.

ashoksoo commented 7 years ago

I ran into similar situation. I had to drop and re create the cluster. I think you should deactivate this command until the issue is resolved. It can corrupt the production environment of people using this in production.