microsoft / service-fabric

Service Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
https://docs.microsoft.com/en-us/azure/service-fabric/
MIT License
3.03k stars 399 forks source link

Connect-ServiceFabricCluster : No cluster endpoint is reachable, please check if there is connectivity/firewall/DNS issue. #769

Open KyleTheAutomator opened 6 years ago

KyleTheAutomator commented 6 years ago

I've downloaded the Service Fabric SDK for VS 2017 from here: http://www.microsoft.com/web/handlers/webpi.ashx?command=getinstallerredirect&appid=MicrosoftAzure-ServiceFabric-CoreSDK

The initial install on my Windows 10 v1709 workstation (fully patched) completes successfully. The problem manifests when I try to setup a cluster:

C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup
λ  .\DevClusterSetup.ps1

Using Cluster Data Root: C:\SfDevCluster\Data
Using Cluster Log Root: C:\SfDevCluster\Log

The generated json path is C:\Users\kthompson\AppData\Local\Temp\tmp3B1A.tmp.json
Processing and validating cluster config.
Create node configuration succeeded
Starting service FabricHostSvc. This may take a few minutes...

Waiting for Service Fabric Cluster to be ready. This may take a few minutes...
Local Cluster ready status: 4% completed.
Local Cluster ready status: 8% completed.
Local Cluster ready status: 12% completed.
Local Cluster ready status: 17% completed.
Local Cluster ready status: 21% completed.
Local Cluster ready status: 25% completed.
Local Cluster ready status: 29% completed.
Local Cluster ready status: 33% completed.
Local Cluster ready status: 38% completed.
Local Cluster ready status: 42% completed.
Local Cluster ready status: 46% completed.
Local Cluster ready status: 50% completed.
Local Cluster ready status: 54% completed.
Local Cluster ready status: 58% completed.
Local Cluster ready status: 62% completed.
Local Cluster ready status: 67% completed.
Local Cluster ready status: 71% completed.
Local Cluster ready status: 75% completed.
Local Cluster ready status: 79% completed.
Local Cluster ready status: 83% completed.
Local Cluster ready status: 88% completed.
Local Cluster ready status: 92% completed.
Local Cluster ready status: 96% completed.
Local Cluster ready status: 100% completed.
WARNING: Service Fabric Cluster is taking longer than expected to connect.

Waiting for Naming Service to be ready. This may take a few minutes...
No cluster endpoint is reachable, please check if there is connectivity/firewall/DNS issue.
Connect-ServiceFabricCluster : No cluster endpoint is reachable, please check if there is connectivity/firewall/DNS
issue.
At C:\Program Files\Microsoft SDKs\Service Fabric\Tools\Scripts\ClusterSetupUtilities.psm1:620 char:12
+     [void](Connect-ServiceFabricCluster @connParams)
+            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [Connect-ServiceFabricCluster], FabricException
    + FullyQualifiedErrorId : TestClusterConnectionErrorId,Microsoft.ServiceFabric.Powershell.ConnectCluster

Pulling my hair out with this over the last couple days. Here's thing's I've tried:

mikkelhegn commented 6 years ago

Can you share the generated json template? C:\Users\kthompson\AppData\Local\Temp\tmp3B1A.tmp.json

KyleTheAutomator commented 6 years ago
{
    "name":  "DevCluster",
    "clusterConfigurationVersion":  "1.0.0",
    "apiVersion":  "10-2017",
    "nodes":  [
                  {
                      "nodeName":  "_Node_0",
                      "iPAddress":  "ComputerFullName",
                      "nodeTypeRef":  "NodeType0",
                      "faultDomain":  "fd:/0",
                      "upgradeDomain":  "0"
                  }
              ],
    "properties":  {
                       "diagnosticsStore":  {
                                                "metadata":  "Please replace the diagnostics file share with an actual file share accessible from all cluster machines.",
                                                "dataDeletionAgeInDays":  "3",
                                                "storeType":  "FileShare",
                                                "connectionstring":  "%systemdrive%\\ProgramData\\SF\\DiagnosticsStore"
                                            },
                       "nodeTypes":  [
                                         {
                                             "name":  "NodeType0",
                                             "clientConnectionEndpointPort":  "19000",
                                             "clusterConnectionEndpointPort":  "19002",
                                             "leaseDriverEndpointPort":  "19001",
                                             "serviceConnectionEndpointPort":  "19006",
                                             "httpGatewayEndpointPort":  "19080",
                                             "reverseProxyEndpointPort":  "19081",
                                             "applicationPorts":  {
                                                                      "startPort":  "30001",
                                                                      "endPort":  "31000"
                                                                  },
                                             "isPrimary":  true
                                         }
                                     ],
                       "fabricSettings":  [
                                              {
                                                  "name":  "Setup",
                                                  "parameters":  [
                                                                     {
                                                                         "name":  "FabricDataRoot",
                                                                         "value":  "C:\\SfDevCluster\\Data"
                                                                     },
                                                                     {
                                                                         "name":  "FabricLogRoot",
                                                                         "value":  "C:\\SfDevCluster\\Log"
                                                                     },
                                                                     {
                                                                         "value":  "true",
                                                                         "name":  "IsDevCluster"
                                                                     }
                                                                 ]
                                              },
                                              {
                                                  "name":  "Diagnostics",
                                                  "parameters":  [
                                                                     {
                                                                         "name":  "ProducerInstances",
                                                                         "value":  "ServiceFabricEtlFile,ServiceFabricPerfCtrFolder"
                                                                     },
                                                                     {
                                                                         "name":  "MaxDiskQuotaInMB",
                                                                         "value":  "10240"
                                                                     },
                                                                     {
                                                                         "name":  "EnableCircularTraceSession",
                                                                         "value":  "true"
                                                                     }
                                                                 ]
                                              },
                                              {
                                                  "name":  "FabricClient",
                                                  "parameters":  [
                                                                     {
                                                                         "name":  "HealthReportSendInterval",
                                                                         "value":  "0"
                                                                     }
                                                                 ]
                                              },
                                              {
                                                  "name":  "Failover",
                                                  "parameters":  [
                                                                     {
                                                                         "name":  "SendToFMTimeout",
                                                                         "value":  "1"
                                                                     },
                                                                     {
                                                                         "name":  "NodeUpRetryInterval",
                                                                         "value":  "1"
                                                                     }
                                                                 ]
                                              },
                                              {
                                                  "name":  "Federation",
                                                  "parameters":  [
                                                                     {
                                                                         "name":  "NodeIdGeneratorVersion",
                                                                         "value":  "V4"
                                                                     },
                                                                     {
                                                                         "name":  "UnresponsiveDuration",
                                                                         "value":  "0"
                                                                     },
                                                                     {
                                                                         "name":  "ProcessAssertExitTimeout",
                                                                         "value":  "86400"
                                                                     }
                                                                 ]
                                              },
                                              {
                                                  "name":  "Hosting",
                                                  "parameters":  [
                                                                     {
                                                                         "name":  "EndpointProviderEnabled",
                                                                         "value":  "true"
                                                                     },
                                                                     {
                                                                         "name":  "RunAsPolicyEnabled",
                                                                         "value":  "true"
                                                                     },
                                                                     {
                                                                         "name":  "EnableProcessDebugging",
                                                                         "value":  "true"
                                                                     },
                                                                     {
                                                                         "name":  "DeactivationScanInterval",
                                                                         "value":  "600"
                                                                     },
                                                                     {
                                                                         "name":  "DeactivationGraceInterval",
                                                                         "value":  "2"
                                                                     },
                                                                     {
                                                                         "name":  "ServiceTypeRegistrationTimeout",
                                                                         "value":  "20"
                                                                     },
                                                                     {
                                                                         "name":  "CacheCleanupScanInterval",
                                                                         "value":  "300"
                                                                     },
                                                                     {
                                                                         "name":  "DeploymentRetryBackoffInterval",
                                                                         "value":  "1"
                                                                     }
                                                                 ]
                                              },
                                              {
                                                  "name":  "Management",
                                                  "parameters":  [
                                                                     {
                                                                         "name":  "ImageStoreConnectionString",
                                                                         "value":  "ImageStoreConnectionStringPlaceHolder"
                                                                     },
                                                                     {
                                                                         "name":  "ImageCachingEnabled",
                                                                         "value":  "false"
                                                                     },
                                                                     {
                                                                         "name":  "EnableDeploymentAtDataRoot",
                                                                         "value":  "true"
                                                                     },
                                                                     {
                                                                         "name":  "DisableChecksumValidation",
                                                                         "value":  "true"
                                                                     }
                                                                 ]
                                              },
                                              {
                                                  "name":  "PlacementAndLoadBalancing",
                                                  "parameters":  [
                                                                     {
                                                                         "name":  "MinLoadBalancingInterval",
                                                                         "value":  "300"
                                                                     },
                                                                     {
                                                                         "name":  "TraceCRMReasons",
                                                                         "value":  "false"
                                                                     }
                                                                 ]
                                              },
                                              {
                                                  "name":  "ReconfigurationAgent",
                                                  "parameters":  [
                                                                     {
                                                                         "name":  "IsDeactivationInfoEnabled",
                                                                         "value":  "true"
                                                                     },
                                                                     {
                                                                         "name":  "ServiceApiHealthDuration",
                                                                         "value":  "20"
                                                                     },
                                                                     {
                                                                         "name":  "ServiceReconfigurationApiHealthDuration",
                                                                         "value":  "20"
                                                                     },
                                                                     {
                                                                         "name":  "LocalHealthReportingTimerInterval",
                                                                         "value":  "5"
                                                                     },
                                                                     {
                                                                         "name":  "RAUpgradeProgressCheckInterval",
                                                                         "value":  "3"
                                                                     },
                                                                     {
                                                                         "name":  "RAPMessageRetryInterval",
                                                                         "value":  "0.5"
                                                                     },
                                                                     {
                                                                         "name":  "MinimumIntervalBetweenRAPMessageRetry",
                                                                         "value":  "0.5"
                                                                     }
                                                                 ]
                                              },
                                              {
                                                  "name":  "ServiceFabricEtlFile",
                                                  "parameters":  [
                                                                     {
                                                                         "name":  "DataDeletionAgeInDays",
                                                                         "value":  "3"
                                                                     },
                                                                     {
                                                                         "name":  "IsEnabled",
                                                                         "value":  "true"
                                                                     },
                                                                     {
                                                                         "name":  "ProducerType",
                                                                         "value":  "EtlFileProducer"
                                                                     },
                                                                     {
                                                                         "name":  "EtlReadIntervalInMinutes",
                                                                         "value":  "5"
                                                                     }
                                                                 ]
                                              },
                                              {
                                                  "name":  "ServiceFabricPerfCtrFolder",
                                                  "parameters":  [
                                                                     {
                                                                         "name":  "DataDeletionAgeInDays",
                                                                         "value":  "3"
                                                                     },
                                                                     {
                                                                         "name":  "IsEnabled",
                                                                         "value":  "true"
                                                                     },
                                                                     {
                                                                         "name":  "ProducerType",
                                                                         "value":  "FolderProducer"
                                                                     },
                                                                     {
                                                                         "name":  "FolderType",
                                                                         "value":  "ServiceFabricPerformanceCounters"
                                                                     }
                                                                 ]
                                              },
                                              {
                                                  "name":  "Trace/Etw",
                                                  "parameters":  [
                                                                     {
                                                                         "name":  "Level",
                                                                         "value":  "4"
                                                                     }
                                                                 ]
                                              },
                                              {
                                                  "name":  "TransactionalReplicator",
                                                  "parameters":  [
                                                                     {
                                                                         "name":  "CheckpointThresholdInMB",
                                                                         "value":  "64"
                                                                     }
                                                                 ]
                                              }
                                          ],
                       "addOnFeatures":  [
                                             "DnsService"
                                         ]
                   }
}
mikkelhegn commented 6 years ago

@maburlik - I don't see anything obvious from the manifest.

knizkar commented 6 years ago

I wonder if you are seeing the same issue as reported in microsoft/service-fabric-issues#1056. Would you mind checking:

  1. Whether Fabric.exe process is running or not.
  2. If not running, presence of following errors in Event Log: image
KyleTheAutomator commented 6 years ago

Spot on. I see the following in my logs:

Fabric Node open failed with error code = E_ACCESSDENIED

Also seeing:

HostedService: _Node_0 on node id bf865279ba277deb864a976fbf4c200e terminated unexpectedly with code 7167 and process name Fabric.exe

HostedServiceInstance:HostedService/_Node_0_Fabric terminated with exitcode 7167

client-localhost:19000/127.0.0.1:19000: error = 2147943625, failureCount=93. Filter by (type~Transport.St && ~"(?i)localhost:19000") to get listener lifecycle. Connect failure is expected if listener was never started, or listener/its process was stopped before/during connecting.

KyleTheAutomator commented 6 years ago

One of our primary use cases in evaluating Service Fabric is to use it for containers. Is there documentation on how to configure a dev cluster for containers using self signed tls certs?

mikkelhegn commented 6 years ago

Thanks @knizkar - let's track this on microsoft/service-fabric-issues#1056.

@MisterPuffyPants - Regarding setting up a dev cluster with containers, a doc will be posted one of the following days, as this is only officially supported in 6.2. Main thing is to make sure that the docker service is started when creating the cluster, that will enable the support in Service Fabric.

medeirosle commented 6 years ago

Exactly same issue here. Any updates?

vitalybibikov commented 6 years ago

Had the same issue, the only thing that helped - going back to 6.2.283/3.1.283

vitalybibikov commented 6 years ago

Any updates? Still see it in the newest version

mikkelhegn commented 6 years ago

@EvilAvenger: Catching up on this issue, have you gone through the solutions proposed in this issue? https://github.com/Azure/service-fabric-issues/issues/1056

vitalybibikov commented 6 years ago

@MikkelHegn

Yes I did, it does not work. Currently the issues is revealing on our deployment machine, so I can't properly test it (as it blocks my team).

The only thing that really helps is installation of 6.2.283.9494. (Installation of prior version, but copying files from 6.2..283 to "C:\Program Files\Microsoft SDKs\Service Fabric" helps as well.)

All the other versions are not working, so it might be, that the issue has been brought somewhere in *.301;

What I've tried:

Event log issues: Currently I can't provide full event log as I've reinstalled the service, I've seen several records in EL:

1) FileChangeMonitor failed with E_ACCESSDENIED 2) FolderACLManager::Install failed with error E_INVALIDARG 3) GetFileAttributesEx failed with the following error 5

mikkelhegn commented 6 years ago

Thanks for your patience on this one @EvilAvenger. @maburlik for the diagnostics info above, do you have any ideas what might be causing this?

andrewcoll commented 6 years ago

Also blocked by this now @MikkelHegn . Anyone any closer to figuring out what is going on? I have tried all the workarounds and it's no use.

raunakpandya commented 6 years ago

Folks, if the workaround mentioned in microsoft/service-fabric-issues#1056 isn't working for you, can you please share full setup logs from the environment? May be you are running into something else here.

(Assuming Windows) The reg key HKLM\SOFTWARE\Microsoft\ServiceFabric\FabricLogRoot should point to the location of the logs. Zip the directory and attach the file here; you can also zip and email it to us (raunakp, or mikhegn at microsoft dot com) if you want.

andrewcoll commented 6 years ago

Log (2).zip

Logs attached.

tjackadams commented 6 years ago

Just to give my two cents on this issue. I was also having the same problem with Windows 10 and the latest SDK. I had checked the windows firewall, removed webroot av, reinstalled the SDK multiple time, reverted back to older SDKs, checked the folder permissions, changed to network service account and any other solutions proposed in this issue https://github.com/Azure/service-fabric-issues/issues/1056

The fix for me was quite simple, @JayRidge95 noticed the hostname was being chopped in the event logs. My computer name was longer than the 15 character net bios name. So we changed my computer name to be shorter than 15 characters, reinstalled the SDK and it worked fine.

Bit of an odd one but it took me about 3 days to get to that point so this might save some people time.

sandipuchdadiya commented 6 years ago

@tjackadams this works like a charm.I have just shorten the computer name.I was stuck in this issue since last 4 days.

petrformanek commented 6 years ago

@tjackadams thanks. It worked. Dear SF team can you fix this issue or at least provide a better error message to identify the issue and solution quickly.

andrewcoll commented 6 years ago

This workaround did not work for me. :( It's still not working.

@raunakpandya is there any update on this?

kuvinodms commented 6 years ago

@andrewcoll +1 Not working for me as well

raunakpandya commented 6 years ago

@andrewcoll - Have you tried the workaround to set the FabricContainerAppsEnabled to false? If not, can you try adding the following section under the hosting section in the ClusterManifestTemplate.json files (depending on the type of one box you bringing up, there would be one file) under %programifiles%\Microsoft SDKs\Service Fabric\ClusterSetup:

Add the following section under the Hosting tab -

      {
        "name": "FabricContainerAppsEnabled",
        "value": "false"
      }
andrewcoll commented 6 years ago

@raunakpandya yes, I tried that, it didn't work either. I attached my logs in a previous comment.

raunakpandya commented 6 years ago

Yes. I did look at the logs. Strange, which json file did you modify, can you attach the same? Also, what one box mode are you trying to bring up (secure/unsecure/ 1 box/5 box)?

abnerescocio commented 6 years ago

The @raunakpandya 's answer work for me. Thanks!!!

caretro commented 5 years ago

@tjackadams your solution worked for me. Shorten computer name (was longer than 15 characters). Thank you!

Kassoul commented 5 years ago

FabricContainerAppsEnabled

@raunakpandya could you please explain why disabling this settings solve this issue ?

raunakpandya commented 5 years ago

@Kassoul - This has the details: https://github.com/Azure/service-fabric-issues/issues/1056#issuecomment-400413031

By disabling that, the self signed certificate is no longer created.

sorawitamorn commented 3 years ago

I have seen the same error when trying to start up my local cluster. In my case, I noticed that some dll is missing from the Fabric.exe - from 'HostService: on node id terminated unexpectedly with code 3221225781 and process name Fabric.exe' error message. For me, The issue was that some of the vc++ dlls went missing and can be fixed by reinstall "C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\vcredist_x64.exe".

seb-emmot commented 2 years ago

I have seen the same error when trying to start up my local cluster. In my case, I noticed that some dll is missing from the Fabric.exe - from 'HostService: on node id terminated unexpectedly with code 3221225781 and process name Fabric.exe' error message. For me, The issue was that some of the vc++ dlls went missing and can be fixed by reinstall "C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\vcredist_x64.exe".

This fixes the issue for me!

sanketpr commented 2 years ago

In my case Service Fabric was not able to bind address 192.168.0.108:19080 which was causing this issue. If any of the above-mentioned solutions didn't work for you, try the following.