microsoft / service-fabric

Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
https://docs.microsoft.com/en-us/azure/service-fabric/
MIT License
3.03k stars 399 forks source link

ServiceFabric fails to start FabricHostSvc in development environment #910

Open arikanderu opened 7 years ago

arikanderu commented 7 years ago

Environment:

Unable to start test project in Visual Studio running. Fails with the following error:


PS C:\WINDOWS\system32> cd 'C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup\' PS C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup> .\DevClusterSetup.ps1

Using Cluster Data Root: C:\SfDevCluster\Data Using Cluster Log Root: C:\SfDevCluster\Log

Create node configuration succeeded Starting service FabricHostSvc. This may take a few minutes... Start-Service : Failed to start service 'Microsoft Service Fabric Host Service (FabricHostSvc)'. At C:\Program Files\Microsoft SDKs\Service Fabric\Tools\Scripts\ClusterSetupUtilities.psm1:453 char:5

StartLocalCluster : Could not start FabricHostSvc At C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup\DevClusterSetup.ps1:73 char:1


arikanderu commented 7 years ago

Running the service under network account seems to help...

masnider commented 7 years ago

Normally we would expect folks to set up the cluster via the local cluster manager tool that we provide. Does setting up the cluster that way also fail?

arikanderu commented 7 years ago

Yes, unfortunately it does. Error above is when cluster is setup and managed via cluster management tool. I tried to switch service to Network Account to resolve this.

Frankly speaking I almost gave up on using sf sdk. It's obviously not mature enough to be used in dev environment, where developer is not a local admin and there are various restriction group policies + antivirus in place, as it is typical for the corporate dev environment.

It fails in bits and pieces giving all sort of errors. Googling around you may see that those errors were encountered by users more than a year ago and they are still in place. For example many of errors discussed here more than one year ago: https://disqus.com/home/discussion/thewindowsazureproductsite/setting_up_your_service_fabric_development_environment_98/

I think you should invest some efforts in making it working in above scenario (i.e. not local admin) as corporate environment would be typical for potential clients of cloud on premise.

Thanks

arikanderu commented 7 years ago

Environment: Windows 7 Enterprise SP1 User - no admin privs Antivirus - McAffee

Errors: 1) Running service facric cluster manager fails, "Manage Local Cluster" remains disabled. In the log:

Report failed with FABRIC_E_GATEWAY_NOT_REACHABLE

client-localhost:19000/127.0.0.1:19000: error = 2147943625, failureCount=160. Filter by (type~Transport.St && ~"(?i)localhost:19000") to get listener lifecycle. Connect failure is expected if listener was never started, or listener/its process was stopped before/during connecting.

client-localhost:19000 : connect failed, having tried all addresses

Transition: Target=Starting, Before:Stopped, After=Stopped, Error=FABRIC_E_INVALID_OPERATION, CallAbort=false

Hosted Service with exe C:\SfDevCluster\Data_Node_0\Fabric\Fabric.Code\Fabric.exe failed to start. ErrorCode=FABRIC_E_INVALID_OPERATION

Hosted Service: HostedService/_Node_0_Fabric is not running and cannot be restarted because of its state. CurrentState=Stopped. ErrorCode=FABRIC_E_INVALID_OPERATION.

Transition: Target=Failed, Before:Stopped, After=Stopped, Error=FABRIC_E_INVALID_OPERATION, CallAbort=false

2) Running SF explorer: ServiceFabric failed. The cluster state might have changed. Try clicking parent node and select 'Refresh'. DllNotFoundException. Unable to load DLL 'FabricClient.dll.

sai-alladi commented 7 years ago

Make sure "Windows Firewall" service is enabled and running in "Services.Msc", if not please enable it and try that. IT should start your fabric service.

explorer14 commented 6 years ago

I am having similar issues starting my local cluster, it was working a few days ago but yesterday I decided to download and install the version 3.x of the SDK and the apparently the world fell apart. Nothing in the logs that would point to anything obvious. My Windows Firewall is all up and running, the fabric host service is not being blocked by it but all I can see is the FabricHostSvc going in a perma-loop trying to start, stopping and repeating. So obviously its trying to start but for some reason is shutting down immediately.

At this point, I have tried uninstalling and re-installing the SDK and runtime multiple times and have tried to start the dev cluster using both powershell script and the cluster manager tool but I keep getting failures. Is this a known issue? Any ideas around this will be much appreciated.

PrplHaz4 commented 6 years ago

@explorer14 we saw this behavior on our dev machines where users have limited permissions. There are two local groups that are usually created by the installer - "ServiceFabricAdministrators" and "ServiceFabricAllowedUsers" - make sure you have those, and that the user running FabricHostSvc is a member of ServiceFabricAdministrators. Also check that these groups have permissions to the location where service fabric will be installed (C:\SfDevCluster).

I agree with @arikanderu above, that MS is really not providing much support for what I would consider a very typical large-corporation configuration, where devs are NOT local admins. Any guidance on this would be much appreciated. My company has multiple thousands of developers that would be exposed to SF if local dev clusters ran with restricted perms.