microsoft / SDN

This repo includes PowerShell scripts and VMM service templates for setting up the Microsoft Software Defined Networking (SDN) Stack using Windows Server 2016
Other
486 stars 541 forks source link

SDN Software Load Balancer - Unable to validate remote certificate / Remote Certificate is Not Authorized #148

Closed rhochmayr closed 6 years ago

rhochmayr commented 6 years ago

After deploying the SLB service via VMM service templates following error keeps coming up in the SLBMUX Eventlogs on the SLB nodes:

Event ID: 9 Unable to validate remote certificate with thumbprint Reason: Remote Certificate is Not Authorized.

The NC certificate has been issued by an internal CA and is trusted by all involved machines.

I also checked the Trusted Root stores on the SLB and NC nodes for certificates with different "issued to" and "issued by" entries but everything looks fine.

On the SLB MUX nodes I enabled the CAPI2 Operational Event Logs and the whole certificate verification process seems to be running without errors.

However, the Timestamps between the Error Message in the slbmux Eventlogs correlate with entries from the CAPI2 Eventlogs when the certificate verifications seem to be taking place.

capi2

These are taking place every 30 seconds. At the same time the above mentioned error about being unable to validate the remote certificate is logged

slbmux

I also tested a manual revocation check with following command on the SLB nodes against the mentioned NC certificate up in the eventlog error and this is also running successfully without any issues. "certutil -f –urlfetch -verify nccertificate.cer"

The NC certificate as well as the SLB MUX Computer certificates have both EKUs (Client & Server Authentication) set.

Are there any other requirements I'm missing here?

Thanks Robert

rhochmayr commented 6 years ago

I have re-deployed the Network Controller and SLB MUX Nodes with Server 2016 images containing the latest updates but the issue still persists.

All components can be deployed successfully and NC is working fine. After deploying the SLB Mux Nodes via Service Template I can go through the "Network Services" wizard in VMM and configure the SLB server without any issues.

After the service is configured the HNVPA NICs on the SLB-MUX Nodes are beeing activated and they receive an address from the HNV IP Pool.

The eventlogs on the SLB MUX Nodes however show the error messages shown in the last post.

When I run Debug-NetworkControllerConfigurationState on the Hyper-V Hosts I get following warnings and Errors:

Status: Warning

Source: SoftwareLoadBalancerManager Code: HostNotConnectedToController Message: Host is not Connected.

Status: Failure

Source: SoftwareLoadBalancerManager Code: VirtualServerUnreachable Message: Loadbalancer Mux is not connected to SLBM. Network Error Code: 10054, Error Message: An existing connection was forcibly closed by the remote host.

What seems weird is following error that also comes up because before adding the SLB-MUX Service the Network Controller was working fine. HNV was working accross multiple hosts and different tenant virtual networks, and still seems to be working fine.

Status: Failure

Source: VirtualSwitch Code: HostNotConnectedToController Message: The host has not yet established communication with the Network Controller.

I assume this message is due to a bug as mentioned here? https://docs.microsoft.com/en-us/windows-server/networking/sdn/troubleshoot/troubleshoot-windows-server-software-defined-networking-stack

The following Eventlog error on the SLB-MUX Nodes is referencing the Thumbprint from the NC certificate:

Unable to validate remote certificate with thumbprint Reason: Remote Certificate is Not Authorized.

Any ideas on where to go from here?

Thanks Robert

rhochmayr commented 6 years ago

Just a quick update,

I enabled the Microsoft-Windows-SlbMux/Debug Log and the issue seems to be with an additional Subject Alternative Name that has been added to the NC cert.

The alternative names have been added in case we want to change to a DNS name instead of an IP-Adress for the NC Endpoint in the future.

Following messages show up in the debug log:

12/20/2017 17:48:33, MuxSvcHost, 16, 0#267,Utilities\Runtime\NcWcfAuthConfigServer.cs#NcCertValidator.Validate: Certificate thumbprint validation failed. Authorized Certs: \<NC Endpoint IP>, Presented Cert - Thumb: \<ThumbPrint>, DNS Name: \<Subject Alternative Name>

(The NC Endpoint IP is the Primary Subject Name on the certificate. Hyper-V Hosts and NC Nodes can communicate fine using this cert.)

12/20/2017 17:48:33, MuxSvcHost, 16, 0#1339,mux\muxworker\Worker.cs#Remote certificate not Authorized. \<Subject Alternative Name>

Is there a way to work around this or do we need to change the certificate to only carry a single Subject Name?

Thanks Robert

rhochmayr commented 6 years ago

Hi there

I will close this issue from my side as the certificate error disappeared once I generated a new NC cert without subject alternative names.

One thing I also noticed is, when deploying the SDN Stack via VMM it is also not allowed to use any other fields than CN (Common Name) for the Subject.

As soon as any of the fields like State, Organization Name, Deparment, Location, etc... are being used the Service Template Deployment Fails.

Thanks Robert