microsoft / service-fabric

Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
https://docs.microsoft.com/en-us/azure/service-fabric/
MIT License
3.02k stars 399 forks source link

Service Fabric cluster does not start #382

Closed smarter-code closed 4 years ago

smarter-code commented 5 years ago

1We used to have Service Fabric properly working previously, however, right now we cannot start the cluster and we get an immediate error. The cluster creation itself had errors.

enter image description here

When I check Service Fabric logs in C:\SFDevCluster I see

Host Application: PowerShell.exe -WindowStyle Hidden -NonInteractive -ExecutionPolicy RemoteSigned -Command & 'C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup\DevClusterSetup.ps1' -Auto -PathToClusterLogRoot C:\SFDevCluster\Log -SetupLogFileName DevClusterSetup.log -CreateOneNodeCluster
Transcript started, output file is C:\SFDevCluster\Log\DevClusterSetup.log
Performing Stop-Service on: FabricHostSvc . This may take a few minutes...
Create node configuration succeeded
Performing Start-Service on: FabricHostSvc . This may take a few minutes...

When I check Service Fabric traces I see FabricDeployer-XXXXXX(longnumber).trace, which has the following content

2019/09/09-09:06:06.239,Info,10844,FabricDeployer.FabricDeployer,Running deployer with Configure /fabricBinRoot:C:\Program Files\Microsoft Service Fabric\bin /fabricDataRoot:C:\SfDevCluster\Data /fabricLogRoot:C:\SFDevCluster\Log /cm:C:\Users\100659\AppData\Local\Temp\SEPC0T2R18-Server-ScaleMin.xml /oldClusterManifestString: /im: /instanceId: /targetVersion: /nodeName: /nodeTypeName: /runAsType: /runAsAccountName: /runAsPassword: /serviceStartupType:Manual /output: /currentVersion: /error: /bootstrapMSIPath: /machineName: /fabricPackageRoot: /jsonClusterConfigLocation: /enableCircularTraceSession:True /continueIfContainersFeatureNotInstalled: /skipDeleteData:
2019/09/09-09:06:06.241,Info,10844,ImageStoreClient.ManagedFileLock,Obtained writer lock for C:\SfDevCluster\Data\lock
2019/09/09-09:06:06.241,Info,10844,FabricDeployer.FabricDeployer,Executing Configure /fabricBinRoot:C:\Program Files\Microsoft Service Fabric\bin /fabricDataRoot:C:\SfDevCluster\Data /fabricLogRoot:C:\SFDevCluster\Log /cm:C:\Users\100659\AppData\Local\Temp\SEPC0T2R18-Server-ScaleMin.xml /oldClusterManifestString: /im: /instanceId: /targetVersion: /nodeName: /nodeTypeName: /runAsType: /runAsAccountName: /runAsPassword: /serviceStartupType:Manual /output: /currentVersion: /error: /bootstrapMSIPath: /machineName: /fabricPackageRoot: /jsonClusterConfigLocation: /enableCircularTraceSession:True /continueIfContainersFeatureNotInstalled: /skipDeleteData:
2019/09/09-09:06:06.249,Info,10844,FabricDeployer.FabricDeployer,Running operation System.Fabric.FabricDeployer.ConfigureOperation
2019/09/09-09:06:06.253,Info,10844,FabricDeployer.FabricDeployer,Creating FabricDataRoot C:\SfDevCluster\Data, if it doesn't exist on machine 
2019/09/09-09:06:06.254,Info,10844,FabricDeployer.FabricDeployer,Creating FabricLogRoot C:\SFDevCluster\Log, if it doesn't exist on machine 
2019/09/09-09:06:06.287,Info,10844,ImageBuilder.FabricDeployer,DnsService feature enabled : True.
2019/09/09-09:06:06.287,Info,10844,ImageBuilder.FabricDeployer,PartitionPrefix setting overriden in DnsService section, Overriden Value: --.
2019/09/09-09:06:06.287,Info,10844,ImageBuilder.FabricDeployer,PartitionSuffix setting overriden in DnsService section, Overriden Value: .
2019/09/09-09:06:06.287,Warning,10844,ImageBuilder.FabricDeployer,Current profile will be disabled by default for firewall rule
2019/09/09-09:06:06.297,Info,10844,FabricDeployer.FabricDeployer,Setting FabricDataRoot to C:\SfDevCluster\Data on machine 
2019/09/09-09:06:06.297,Info,10844,FabricDeployer.FabricDeployer,Setting FabricLogRoot to C:\SFDevCluster\Log on machine 
2019/09/09-09:06:06.297,Info,10844,FabricDeployer.FabricDeployer,Setting EnableCircularTraceSession to True on machine 
2019/09/09-09:06:06.297,Info,10844,FabricDeployer.FabricDeployer,Setting EnableUnsupportedPreviewFeatures to False on machine 
2019/09/09-09:06:06.297,Info,10844,FabricDeployer.FabricDeployer,Setting IsSFVolumeDiskServiceEnabled to False on machine 
2019/09/09-09:06:06.298,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter FabricDataRoot, has value C:\SfDevCluster\Data
2019/09/09-09:06:06.298,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter FabricLogRoot, has value C:\SFDevCluster\Log
2019/09/09-09:06:06.298,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter ServiceRunAsAccountName, has value 
2019/09/09-09:06:06.298,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter ServiceRunAsPassword, has value 
2019/09/09-09:06:06.298,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter SkipFirewallConfiguration, has value true
2019/09/09-09:06:06.298,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter ServiceStartupType, has value 
2019/09/09-09:06:06.298,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter ContainerNetworkName, has value 
2019/09/09-09:06:06.298,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter ContainerNetworkSetup, has value 
2019/09/09-09:06:06.298,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter SkipContainerNetworkResetOnReboot, has value 
2019/09/09-09:06:06.298,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter SkipIsolatedNetworkResetOnReboot, has value 
2019/09/09-09:06:06.298,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter IsolatedNetworkName, has value 
2019/09/09-09:06:06.298,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter IsolatedNetworkSetup, has value 
2019/09/09-09:06:06.298,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter IsolatedNetworkInterfaceName, has value 
2019/09/09-09:06:06.299,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter EnableCircularTraceSession, has value true
2019/09/09-09:06:06.299,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter ContainerDnsSetup, has value 
2019/09/09-09:06:06.299,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter: ContainerDnsSetup, value: <null>, interpreted value: Allow
2019/09/09-09:06:06.299,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter EnableUnsupportedPreviewFeatures, has value 
2019/09/09-09:06:06.299,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter IsSFVolumeDiskServiceEnabled, has value 
2019/09/09-09:06:06.299,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter SfCnsNetworkPluginCnsUrlPort, has value 
2019/09/09-09:06:06.299,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter SfCnsNetworkPluginCnmUrlPort, has value 
2019/09/09-09:06:06.299,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter IsolatedNetworkPluginParams, has value 
2019/09/09-09:06:06.299,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter UseContainerServiceArguments, has value 
2019/09/09-09:06:06.299,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter ContainerServiceArguments, has value 
2019/09/09-09:06:06.299,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter EnableContainerServiceDebugMode, has value 
2019/09/09-09:06:06.299,Info,10844,FabricDeployer.FabricDeployer,Setup section, parameter DisableContainers, has value 
2019/09/09-09:06:06.299,Info,10844,FabricDeployer.FabricDeployer,Copying ClusterManifest to C:\SfDevCluster\Data\clusterManifest.xml
2019/09/09-09:06:06.308,Info,10844,FabricDeployer.FabricDeployer,Set Service Fabric Host Service to start up type to Manual
2019/09/09-09:06:06.310,Info,10844,FabricDeployer.FabricDeployer,TargetInformationFileName is C:\SfDevCluster\Data\TargetInformation.xml
2019/09/09-09:06:06.317,Info,10844,FabricDeployer.FabricDeployer,Target information file C:\SfDevCluster\Data\TargetInformation.xml written on machine: 
2019/09/09-09:06:06.323,Info,10844,FabricDeployer.FabricDeployer,Host Settings file generated at C:\SfDevCluster\Data\FabricHostSettings.xml
2019/09/09-09:06:06.327,Info,10844,ImageStoreClient.ManagedFileLock,Released writer lock on C:\SfDevCluster\Data\lock

One interesting line from the previous is:

2019/09/09-09:06:06.287,Warning,10844,ImageBuilder.FabricDeployer,Current profile will be disabled by default for firewall rule

Which made me feel there could be some firewall rules blocking me, but I could not decide exactly what is goining on.

I had a look in Windows Event Viewer I see the following Service Fabric related events from different areas:

enter image description here

enter image description here

Also when I look under (Applications & Services Log ==> Microsoft-Service Fabric ==> Admin) I see the following:

Error FileChangeMonitor failed with E_ACCESSDENIED

Warning FileChangeMonitor failed file C:\SfDevCluster \Data\FabricHostSettings.xml with ErrorCode E_ACCESSDENIED.

Error GetFileAttributesEx failed with the following error 5

Error Unable to stop FabricHostSvc service because System.InvalidOperationException: Cannot stop FabricHostSvc service on computer '.'. ---> System.ComponentModel.Win32Exception: The service has not been started --- End of inner exception stack trace --- at System.ServiceProcess.ServiceController.Stop() at System.Fabric.FabricDeployer.FabricDeployerServiceController.Stop(String serviceName, String machineName)

Error Unable to start fabric host service because System.InvalidOperationException: Cannot start service FabricHostSvc on computer '.'. ---> System.ComponentModel.Win32Exception: The service did not respond to the start or control request in a timely fashion --- End of inner exception stack trace --- at System.ServiceProcess.ServiceController.Start(String[] args) at System.Fabric.FabricDeployer.FabricDeployerServiceController.StartHostSvc(String machineName)

Error Error occurred while cleaning up isolated network setup exception System.ArgumentNullException: Value cannot be null. Parameter name: format at System.String.FormatHelper(IFormatProvider provider, String format, ParamsArray args) at System.Fabric.FabricDeployer.RemoveOperation.RemoveNetworks(DeploymentParameters parameters)

Warning ParseConfigSettings: ErrorCode=E_FAIL, FileName=C:\SfDevCluster\Data\FabricHostSettings.xml

Warning CreateFileW failed: file=\?\C:\SfDevCluster\Data\FabricHostSettings.xml error=32

We have tried all the following solutions but non worked:

Most of the above attempts are from this issue in Github: https://github.com/Azure/service-fabric-issues/issues/1056

maburlik commented 4 years ago

When the service "does not respond" it is usually the result of the FabricHost service executable failing to launch due to environment dll availability conflict. Try to launch FabricHost in console mode from its residing directory (%programfiles%\Microsoft Service Fabric\bin) in an admin console: FabricHost.exe -c

If this doesn't yield a helpful error, you can use the Dependency Walker tool to map out discoverable dlls for the exe, or Windows debugging tools gflags with loader snaps and cdb to enumerate the API call that's failing to resolve.

randheerucsc commented 4 years ago

@maburlik I am also facing the same issue as this thread and I tried Dependency Walker tool but id it did not help much. There are lot of dependencies shows on that tool. Seems they are not relevant of this issue. dependency walker

I am tried today with latest runtime and sdk, install ServiceFabricRuntime_6_5_CU5 and ServiceFabricSDK_3_4_CU5 sdk manually after tried with web installer . Web Installer getting an error which runtime is not installed. problem is the service "FabricHostSvc" is not starting with the error "Error 1053: The Service did not respond to start or control request in a timely fashion ".

My PC has windows 10 and previously it worked , but suddenly It stop working. Could not find any reason. Now I cannot install any previous versions also.

smarter-code commented 4 years ago

Thanks a lot @maburlik it works now. When I attempted to run Service Fabric executable it was failing due to a corrupted DLL in OS (most likely due to group policy updates from my company), I got that corrupted DLL, I also used the TestConfiguration script and it works now.

To read more on test configuration script go to (Validate environment using TestConfiguration script) in https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-standalone-deployment-preparation.

Three97 commented 5 months ago

We had a similar problem with a fresh install of Windows 11. The cluster would not start properly. I had to install Visual C++ Redistributable 2012 x64 and it started working. Hopefully this will help someone else.