kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.59k stars 1.32k forks source link

Windows Support: NetBIOS and Active Directory LDAP SAMAccountName restrictions on Hostname #2217

Closed rhockenbury closed 1 year ago

rhockenbury commented 4 years ago

User Story

As an operator, I would like to manage windows server worker nodes with the cluster api. Hostnames on windows are limited to 15 characters, and the hostnames that are set by the cluster api (by default in cloud-init metadata) exceed this limit. The cluster api should support a more flexible mechanism of setting hostnames so that shorter hostnames can be set for VMs.

Detailed Description

Netbios requires windows computer names to be 15 characters or fewer (https://support.microsoft.com/en-us/help/909264/naming-conventions-in-active-directory-for-computers-domains-sites-and). Attempting to set hostname with more than 15 characters on a windows machine will result in only the first 15 being used.

When using the machine deployment api object, the machine api object names are derived from the machineset controller (https://github.com/kubernetes-sigs/cluster-api/blob/7884484b621f13f604e74f60053f4214a2f19702/controllers/machineset_controller.go#L434). This name is later used to set the vm name (for example in CAPV - https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/895539d004ea33299435a2c739791e9800d0c2ae/controllers/vspheremachine_controller.go#L320), and then also as the local-hostname in the cloud-init metadata (https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/390c49a23e2b535a27b330e4983c59eb0b42f476/pkg/services/govmomi/service.go#L203).

The machine api object names are prefixed by the name of the machine deployment api object. These names, for example, will be in the form:

workload-cluster-2-md-0-5f77f47487-2c4sq 
workload-cluster-2-md-0-5f77f47487-25xhg

where workload-cluster-2-md-0 is the name of the machine deployment api object. The prefix is appended with 17 extra characters (-5f77f47487-2c4sq, -5f77f47487-25xhg), which will bring the total character count above 15. Notice that setting the deployment api object name to 3 or more characters will guarantee the same first 15 characters, and thus hostname collisions for the nodes. Being able to set the deployment api object name to something more meaningful than what could be expressed in 3 characters would be useful.

My current workaround is to have cloudbase-init invoke an additional script before the join command that reforms the host name and sets it for the vm. This is somewhat undesirable as now the hostname and node api object name are not the same as the vm name. For consistency, it's desired (but not required) that the the vm name (as shown by the cloud provider), the machine api object name, and the node api object name are the same as the hostname of the vm.

Anything else you would like to add:

I realize that windows worker nodes are not officially supported by the cluster api, but I'm mentioning it since it's something that's up for discussion for the cluster-api roadmap (https://github.com/kubernetes-sigs/cluster-api/pull/2148/files#diff-767f66541aad47089dd5ded720dede6bR32).

Another workaround could be use to use the machine api object directly instead of the machine deployment api object, which would directly set the vm name based on the name of the machine api object. However, the benefits of using the machine deployment are lost.

/kind feature

detiber commented 4 years ago

The main reason for the hostname matching the Machine name is currently due to the initial implementation details of vSphere infrastructure provider. In the case of AWS and Linux hosts, there is a requirement when using the AWS cloud provider integration that the hostname must match the internal dns name of the host and we override the hostname setting via cloud-init config for each Machine we provision.

Outside of limitations mentioned above, there should be no requirements that the hostname of an individual instance match the Machine name in any way.

rhockenbury commented 4 years ago

Agreed - that's certainly not a requirement.

The cloud-init metadata local-hostname is set to the Machine name (at least on CAPV) - what I would propose is flexibility with how local-hostname metadata gets set, so that it's not necessarily set by default to the Machine name.

benmoss commented 4 years ago

I don't think this is a CAPI issue, I think this is just with CAPV. On AWS the hostnames are not specified in the cloud-init metadata

rhockenbury commented 4 years ago

@akutz @yastij Would you mind taking a look at this?

randomvariable commented 4 years ago

Is this definitely an issue in a Kubernetes context? The linked page looks like it was written for Windows XP and 2003 when NetBIOS was still a thing. AD DNS names shouldn't be restricted in the same way, and they do say for FQDNs, it's 63 chars per component, 255 total.

Is the issue is that a machine configured with NetBIOS will register a Kerberos principal with the truncated name? If so, is there a case to be made that NetBIOS should be disabled in Windows images?

rhockenbury commented 4 years ago

AFAIK, NetBios is still required to domain join a windows machine. Looping in @ksubrmnn and @JocelynBerrendonner.

randomvariable commented 4 years ago

It might depend on how credentials are provided and how the domain is specified. If the FQDN is used and credentials are provided as joinuser@ad.fqdn.contoso.com, it should default to the DNS SRV records? I admit it's been a decade since I touched Windows, but my memory was that this was possible in at least Win2K8/Vista.

JocelynBerrendonner commented 4 years ago

AFAIK, NetBios is still required to domain join a windows machine. Looping in @ksubrmnn and @JocelynBerrendonner.

Thanks for reaching out! I don't know the answer to the Netbios/domain join question off the top of my head, but I'll find the experts and pull them in shortly.

JocelynBerrendonner commented 4 years ago

@rhockenbury : As per my investigation, netbios is not required to join a domain on Windows machine (that's been the case since around Windows 2000). The page you mentioned only provide naming conventions when Netbios is actually used. Also, as other folks mentioned, the machine name is only truncated in Netbios. When setting a long host name (let's say "MyComputerWithALongName") in a domain (let's say "contoso.com"), the machine is still reachable through its FQDN "MyComputerWithALongName.contoso.com". However, through Netbios, it will indeed only be reachable through the truncated Nebios name "MyComputerWithA".

Is using FQDN an option here?

rhockenbury commented 4 years ago

Thanks for the additional insight. It feels that it would be best to disable NetBios seeing how with using the machine api object name as the hostname would result in NetBios name collisions. I'll need to follow-up internally to see if we could do this.

JocelynBerrendonner commented 4 years ago

@rhockenbury : after further discussions with the experts, NETBIOS name resolution is mostly unused today. Though the first step in name resolution is usually going through NETBIOS, if the NETBIOS name is not found, Windows will fallback to resolving the machine name using DNS. For example, if you try to reach a machine through "MyComputerWithALongName", Windows will be able to find that name in DNS provided that the DNS Suffix search order is properly populated in the network interface TCP/IP settings (this last point is important). If you try to ping "MyComputerWithALongName" and if the Suffix is properly populated (to, let's say contoso.com), then Windows will behave similarly to Linux and try "MyComputerWithALongName.contoso.com".

The bottom line is, I previously suggested using the FQDN, but as per my discussion with the expert, there is actually no need for it. If the DNS suffix search order is properly populated in Windows nodes, the long host names Cluster-API generates should directly be usable. And whether NETBIOS is enable or not shouldn't matter. If a long name doesn't work with NETBIOS enabled, it will likely not work with NETBIOS disabled either.

FWIW, you can check the DNS suffix list using the Get-DnsClientGlobalSettings in powershell:

_PS C:\hns> Get-DnsClientGlobalSetting

UseSuffixSearchList : True SuffixSearchList : {contoso.com} UseDevolution : True DevolutionLevel : 0_

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

rhockenbury commented 4 years ago

/remove-lifecycle stale

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

randomvariable commented 4 years ago

I think we concluded that this isn't an issue? @jsturtevant has also stated as such in the Windows proposal.

/close for now, and we can revisit if it turns out to be a problem?

k8s-ci-robot commented 4 years ago

@randomvariable: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/2217#issuecomment-690112185): >I think we concluded that this isn't an issue? @jsturtevant has also stated as such in the Windows proposal. > >/close >for now, and we can revisit if it turns out to be a problem? Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
rhockenbury commented 4 years ago

https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/pull/1052

randomvariable commented 3 years ago

/reopen

This question was re-raised in SIG Windows around app support, though we were wondering that since pod names and DNS names synonymous, then pod names longer than the NETBIOS limit should also break applications that don't support longer names. If that's the case, it still doesn't make sense to make this a cluster api concern.

I think @JocelynBerrendonner was going to get a definitive answer.

k8s-ci-robot commented 3 years ago

@randomvariable: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/2217#issuecomment-735911167): >/reopen > >This question was re-raised in SIG Windows around app support, though we were wondering that since pod names and DNS names synonymous, then pod names longer than the NETBIOS limit should also break applications that don't support longer names. If that's the case, it still doesn't make sense to make this a cluster api concern. > >I think @JocelynBerrendonner was going to get a definitive answer. Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
randomvariable commented 3 years ago

/lifecycle frozen

JocelynBerrendonner commented 3 years ago

Hi everyone,

There has been additional discussions about this, and additional learnings for me since my last message.

In a nutshell:

A few question remain, though:

randomvariable commented 3 years ago

I've also been checking in and found that Active Directory SAMAccountName is restricted to 20 characters. It's not necessarily a blocker since SAMAccountName doesn't need to match the computer name, but it places constraints on uniqueness.

You're right that the hostname is a function of the provider, not CAPI.

perithompson commented 3 years ago

Just adding some additional context to this, it seems that if your hostname is over 15 characters the $env:computername variable cuts off at 15 characters, which I guess it because this is related to the GetComputerName API, this may cause problems for those people using Powershell to configure cni or something similar. hostname however still gets the longer hostname.

Also, when using this with CAPV I have noticed that the identifiers at the end of the generated hostname are over 15 characters before you even add user-specified portion so that may need to be considered when running windows machine deployments.

randomvariable commented 3 years ago

Noted. thanks.

/area node-agent

k8s-ci-robot commented 3 years ago

@randomvariable: The label(s) area/node-agent cannot be applied, because the repository doesn't have them

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/2217#issuecomment-762817433): >Noted. thanks. > >/area node-agent Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
JocelynBerrendonner commented 3 years ago

@perithompson : Thanks for mentioning this! IIRC, using [System.Net.Dns]::GetHostName() in powershell also returns the full name.

randomvariable commented 3 years ago

/retitle Windows Support: NetBIOS and Active Directory LDAP SAMAccountName restrictions on Hostname

randomvariable commented 3 years ago

Update on this: Regardless of NETBIOS, we will need hostname restricted because of the SAMAccountName, so have retitled the issue appropriately.

In terms of next steps:

Whether or not the machine, and concretely, cloud-init, ignition or whatever takes the hostname from the VM name is up to the cloud provider. It is the case for vSphere, Azure (maybe?), but not for AWS. AWS only uses the instance ID.

For AWS, this means if the machine name is shortened, this has no impact on the hostname unless the hostname is explicitly set in the userdata via cloud-init. However, we also would not want to default this because the Kubernetes AWS Cloud Provider (CPI not CAPA) requires the node name to match the host name which in turn MUST match the instance ID.

Next steps are to document:

JocelynBerrendonner commented 3 years ago

@randomvariable, it may be worth noting that SAMAccountName is a name is used to support legacy versions of Windows (Windows NT4, Windows 95, Windows 98, ...: https://docs.microsoft.com/en-us/windows/win32/ad/naming-properties#samaccountname) I believe Windows 2000 and up don't require it.

randomvariable commented 3 years ago

The docs are referring to how SAMAccountName is consumed, as in it's typically consumed by legacy apps. However, it's still a mandatory field on the Computer LDAP schema, and from which the Computer name is derived - with no indication of being deprecated. SAMAccountName is also used during AD domain join, so it's the strongest of all of these requirements IMO.

akutz commented 3 years ago

I worked with AD and LDAP for years, and I can say that SAMAccountName is still very much used by multiple applications, especially those that sync directory data into or out of AD. Windows is notorious for backwards compatibility.

Time for Andrew's pedantic point of the day -- technically SAMAccountName is not part of Active Directory's LDAP schema's Computer class. Rather, the Computer class extends the User class, where SAMAccountName is marked as mandatory. Still, this has the same effect as @randomvariable illustrated above -- SAMAccountName is required when creating an object from the Computer class.

JocelynBerrendonner commented 3 years ago

Absolutely, legacy apps and legacy Windows versions (Windows 9x, Windows NT) are limited by 15 characters names. That said, it is still possible to use >15 characters names with current versions of Windows (granted, this comes with a boat load of limitations). I think it makes sense to have some limitations for the Windows names for the many cases where the applications are limited to 15 characters, but should this limitation be optional for the cases where 15 characters is not a concern? Also, as I previously mentioned:

akutz commented 3 years ago

Hi @JocelynBerrendonner,

I have not been part of this thread from the beginning, so I apologize if this next set of questions have been asked and answered (I searched for the word hash and did not see it):

JocelynBerrendonner commented 3 years ago

@akutz : to be honest I am not a an LDAP/SAMAccountName expert, so I'd have to ask the experts to find the answer to your questions. That said, typically, when people want to make sure their long (>15 characters) names work well with legacy apps and make sure they are not colliding with anything on the network, they just make sure the first 15 characters are unique on the network. I've seen that approach work fairly well (with the caveats mentioned previously). So, may-be that's a clue in regards to SAMAccountName?

akutz commented 3 years ago

One way to do this would be to take the existing machine name, ex. workload-cluster-2-md-0-5f77f47487-2c4sq and:

  1. Generate an adler32 checksum from the machine name, ex. 1fde0cd1
  2. Concatenate the machine name's first six characters and last five characters, ex. worklo2c4sq
  3. Concatenate the values from steps one and two, using a - character as a separator, ex. worklo2c4sq-1fde0cd1

The value worklo2c4sq-1fde0cd1 is exactly 20 characters long, and:

Heck, an even cheaper way to do this is just take the first 10 characters from the front of a machine name and last 10 characters from a machine name and use that as the SAMAccountName, ex. workload-7487-2c4sq.

randomvariable commented 3 years ago

I guess we need to check the Win32 Domain Join function and how it relates to machine name. I'm pretty sure SAMAccountName can be provided upon join, but not sure if the API then changes the hostname, which we'll then run into the cloud provider issues.

weiwenli97 commented 3 years ago

When adding into AD Group, the name from Get-WmiObject -Class Win32_ComputerSystem is used. And 15 characters is limited. If the length is greater than 15, only the first 15 characters is used. And the name could be different from hostname.

weiwenli97 commented 3 years ago

Hi @JocelynBerrendonner,

I create two windows workers which have the same begin 15 characters. Then I used Add-computer ps command to add them into the existed AD one by one. First, the 1st worker is added successfully and I could see it in AD server and the display name is truncated but the full fqdn name could display when I clicked it. The I add the 2nd worker. Also, the PS command return True. Then I checked the AD server. There is only one item in it. And the full fqdn name changed to the 2nd worker hostname. Then I re-try add-computer in the 1st worker and it tells me that " because it is already in that domain." But I cannot find the worker in AD server. Could you please help me on it? Thanks!

vincepri commented 3 years ago

/kind document /assign @randomvariable /milestone v1.1

k8s-ci-robot commented 3 years ago

@vincepri: The label(s) kind/document cannot be applied, because the repository doesn't have them.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/2217#issuecomment-949857586): >/kind document >/assign @randomvariable >/milestone v1.1 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
vincepri commented 3 years ago

cc @jayunit100

jayunit100 commented 3 years ago

Yup! So, we'd love to propose a fix to this or go through the other folks proposed fixes in an upcoming capi meeting ?

randomvariable commented 3 years ago

@jayunit100 feel free to reach out if you have a solution in mind. We'll then request the change as required, whether that's some provider contract (which I suspect it might be) or otherwise

randomvariable commented 3 years ago

some of the stuff @jayunit100 and @weiwenli97 have been looking at is in https://docs.google.com/document/d/1C7PxLukDUyGxhgPxHRpGYPbROlZarak0QdE7grUPReQ/edit#

sbueringer commented 2 years ago

/help

k8s-ci-robot commented 2 years ago

@sbueringer: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/2217): >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
sbueringer commented 2 years ago

/unassign @randomvariable

fabriziopandini commented 2 years ago

/triage needs-information @CecileRobertMichon is this still a problem?

CecileRobertMichon commented 2 years ago

if you're asking if

Hostnames on windows are limited to 15 characters

is still true, then yes. I know some providers including CAPZ have implemented workaround to trim the AzureMachineName to use as hostname (https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/scope/machine.go#L399). Not sure if this is something that can be fixed at the CAPI level. @marosset do you have any thoughts?

fabriziopandini commented 1 year ago

/triage accepted