microsoft / service-fabric-issues

This repo is for the reporting of issues found with Azure Service Fabric.
168 stars 21 forks source link

'Failed to configure endpoint' #1661

Closed Bluhman closed 4 years ago

Bluhman commented 4 years ago

When supplying a completely valid certificate thumbprint for my endpoint on azure, I get this error when trying to deploy an API stateless service to my cluster: "There was an error during activation.Failed to configure endpoint with certificate ---certificateIWantToUseThumbPrintHere---. Error 0x80070520."

Expected Behavior

it should work

Current Behavior

it not work

Steps to Reproduce

  1. Request a new certificate for your SF Cluster on Azure, with a CN equal to the URL of the cluster.
  2. With the new certificate, take its thumbprint and add it as an Admin and Read-Only certificate thumbprint on Azure because these are the only apparent options available.
  3. Add the newly-created certificate into the keyvault being leveraged by the cluster.
  4. Add reference to newly-added certificate to the Virtual Machine Scale Set via annoying powershell script
  5. Update references in the Azure App Configuration instance used to get to the Keyvault.
  6. Wait for cluster to restart.
  7. Delete the old copy of the app on the cluster as according to similar issues of this nature showing the wrong thumbprint.
  8. Publish the same app again.
  9. See the error above once the pipeline has completed.

Context (Environment)

Trying to make it so that the user doesn't have to let through an 'insecure' certificate whenever they want to call our API, causing issues with usability for a UI that's leveraging the API in the cluster because it's not being treated as secure.

Here's the app manifest I'm using for the culprit app on the cluster. The parameters are supplied as variables in the pipeline:

  <Parameters>
    <Parameter Name="ASPNETCORE_ENVIRONMENT" DefaultValue="" />
    <Parameter Name="AZURE_CLIENT_ID" DefaultValue="" />
    <Parameter Name="AZURE_CLIENT_SECRET" DefaultValue="" />
    <Parameter Name="AZURE_TENANT_ID" DefaultValue="" />
    <Parameter Name="CertThumbPrint" DefaultValue="" />
    <Parameter Name="ConnectionStrings:IAPAppConfig" DefaultValue="" />
    <Parameter Name="IAP.APIService_InstanceCount" DefaultValue="-1" />
  </Parameters>
  <ServiceManifestImport>
    <ServiceManifestRef ServiceManifestName="IAP.APIServicePkg" ServiceManifestVersion="1.0.0.20200326.5" />
    <ConfigOverrides />
    <EnvironmentOverrides CodePackageRef="Code">
      <EnvironmentVariable Name="CertThumbPrint" Value="[CertThumbPrint]" />
      <EnvironmentVariable Name="AZURE_TENANT_ID" Value="[AZURE_TENANT_ID]" />
      <EnvironmentVariable Name="AZURE_CLIENT_ID" Value="[AZURE_CLIENT_ID]" />
      <EnvironmentVariable Name="AZURE_CLIENT_SECRET" Value="[AZURE_CLIENT_SECRET]" />
      <EnvironmentVariable Name="ConnectionStrings:IAPAppConfig" Value="[ConnectionStrings:IAPAppConfig]" />
    </EnvironmentOverrides>
    <Policies>
      <EndpointBindingPolicy EndpointRef="ServiceEndpoint" CertificateRef="certificateinfo" />
    </Policies>
  </ServiceManifestImport>
  <DefaultServices>
    <Service Name="IAP.APIService" ServicePackageActivationMode="ExclusiveProcess">
      <StatelessService ServiceTypeName="IAP.APIServiceType" InstanceCount="[IAP.APIService_InstanceCount]">
        <SingletonPartition />
      </StatelessService>
    </Service>
  </DefaultServices>
  <Principals>
    <Users>
      <User Name="system" AccountType="LocalSystem" AccountName="NT AUTHORITY\SYSTEM" />
    </Users>
  </Principals>
  <Policies>
    <DefaultRunAsPolicy UserRef="system" />
  </Policies>
  <Certificates>
    <EndpointCertificate X509FindValue="[CertThumbPrint]" Name="certificateinfo" />
  </Certificates>
</ApplicationManifest>

Security section of the manifest for my whole cluster:

    <Section Name="Security">
      <Parameter Name="AdminClientCertThumbprints" Value="oldCertThumbPrint,newCertThumbPrint" />
      <Parameter Name="ClientAuthAllowedCommonNames" Value="westus.servicefabric.azure.com" />
      <Parameter Name="ClientCertThumbprints" Value="oldCertThumbPrint,newCertThumbPrint" />
      <Parameter Name="ClientRoleEnabled" Value="true" />
      <Parameter Name="ClusterCertThumbprints" Value="oldCertThumbPrint" />
      <Parameter Name="ClusterCredentialType" Value="X509" />
      <Parameter Name="ClusterProtectionLevel" Value="EncryptAndSign" />
      <Parameter Name="DisableFirewallRuleForDomainProfile" Value="false" />
      <Parameter Name="DisableFirewallRuleForPrivateProfile" Value="false" />
      <Parameter Name="DisableFirewallRuleForPublicProfile" Value="false" />
      <Parameter Name="IgnoreCrlOfflineError" Value="true" />
      <Parameter Name="ServerAuthCredentialType" Value="X509" />
      <Parameter Name="ServerCertThumbprints" Value="oldCertThumbPrint" />
      <Parameter Name="UseSecondaryIfNewer" Value="true" />
    </Section>

There's documentation on how to modify these values above, but it does it through ARM templates I don't have access to because the cluster component doesn't have an option to export an ARM template.

Service Fabric Runtime and SDK Version :

7.0.470.9590

Operating System :

win 10

Cluster Size :

Dev, 5 nodes.

Possible Workaround

Bluhman commented 4 years ago

issue still occurs after walking through and running this script: https://docs.microsoft.com/en-us/azure/service-fabric/scripts/service-fabric-powershell-add-application-certificate AFAIK my VMSS (the one thing I think I was missing in this problem) now has awareness of the Client Certificate I'm trying to use, but it's still not allowing my application to build out to the cluster.

Updating OP to reflect the new reproduction step!

Bluhman commented 4 years ago

oh ok apparently running this step, which is distinct from the step to 'add a certificate to a vmss', is also required.

https://docs.microsoft.com/en-us/azure/service-fabric/scripts/service-fabric-powershell-add-application-certificate#update-the-virtual-machine-scale-set