Closed rhockenbury closed 1 year ago
The main reason for the hostname matching the Machine name is currently due to the initial implementation details of vSphere infrastructure provider. In the case of AWS and Linux hosts, there is a requirement when using the AWS cloud provider integration that the hostname must match the internal dns name of the host and we override the hostname setting via cloud-init config for each Machine we provision.
Outside of limitations mentioned above, there should be no requirements that the hostname of an individual instance match the Machine name in any way.
Agreed - that's certainly not a requirement.
The cloud-init metadata local-hostname is set to the Machine name (at least on CAPV) - what I would propose is flexibility with how local-hostname metadata gets set, so that it's not necessarily set by default to the Machine name.
I don't think this is a CAPI issue, I think this is just with CAPV. On AWS the hostnames are not specified in the cloud-init metadata
@akutz @yastij Would you mind taking a look at this?
Is this definitely an issue in a Kubernetes context? The linked page looks like it was written for Windows XP and 2003 when NetBIOS was still a thing. AD DNS names shouldn't be restricted in the same way, and they do say for FQDNs, it's 63 chars per component, 255 total.
Is the issue is that a machine configured with NetBIOS will register a Kerberos principal with the truncated name? If so, is there a case to be made that NetBIOS should be disabled in Windows images?
AFAIK, NetBios is still required to domain join a windows machine. Looping in @ksubrmnn and @JocelynBerrendonner.
It might depend on how credentials are provided and how the domain is specified. If the FQDN is used and credentials are provided as joinuser@ad.fqdn.contoso.com, it should default to the DNS SRV records? I admit it's been a decade since I touched Windows, but my memory was that this was possible in at least Win2K8/Vista.
AFAIK, NetBios is still required to domain join a windows machine. Looping in @ksubrmnn and @JocelynBerrendonner.
Thanks for reaching out! I don't know the answer to the Netbios/domain join question off the top of my head, but I'll find the experts and pull them in shortly.
@rhockenbury : As per my investigation, netbios is not required to join a domain on Windows machine (that's been the case since around Windows 2000). The page you mentioned only provide naming conventions when Netbios is actually used. Also, as other folks mentioned, the machine name is only truncated in Netbios. When setting a long host name (let's say "MyComputerWithALongName") in a domain (let's say "contoso.com"), the machine is still reachable through its FQDN "MyComputerWithALongName.contoso.com". However, through Netbios, it will indeed only be reachable through the truncated Nebios name "MyComputerWithA".
Is using FQDN an option here?
Thanks for the additional insight. It feels that it would be best to disable NetBios seeing how with using the machine
api object name as the hostname would result in NetBios name collisions. I'll need to follow-up internally to see if we could do this.
@rhockenbury : after further discussions with the experts, NETBIOS name resolution is mostly unused today. Though the first step in name resolution is usually going through NETBIOS, if the NETBIOS name is not found, Windows will fallback to resolving the machine name using DNS. For example, if you try to reach a machine through "MyComputerWithALongName", Windows will be able to find that name in DNS provided that the DNS Suffix search order is properly populated in the network interface TCP/IP settings (this last point is important). If you try to ping "MyComputerWithALongName" and if the Suffix is properly populated (to, let's say contoso.com), then Windows will behave similarly to Linux and try "MyComputerWithALongName.contoso.com".
The bottom line is, I previously suggested using the FQDN, but as per my discussion with the expert, there is actually no need for it. If the DNS suffix search order is properly populated in Windows nodes, the long host names Cluster-API generates should directly be usable. And whether NETBIOS is enable or not shouldn't matter. If a long name doesn't work with NETBIOS enabled, it will likely not work with NETBIOS disabled either.
FWIW, you can check the DNS suffix list using the Get-DnsClientGlobalSettings in powershell:
_PS C:\hns> Get-DnsClientGlobalSetting
UseSuffixSearchList : True SuffixSearchList : {contoso.com} UseDevolution : True DevolutionLevel : 0_
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
I think we concluded that this isn't an issue? @jsturtevant has also stated as such in the Windows proposal.
/close for now, and we can revisit if it turns out to be a problem?
@randomvariable: Closing this issue.
/reopen
This question was re-raised in SIG Windows around app support, though we were wondering that since pod names and DNS names synonymous, then pod names longer than the NETBIOS limit should also break applications that don't support longer names. If that's the case, it still doesn't make sense to make this a cluster api concern.
I think @JocelynBerrendonner was going to get a definitive answer.
@randomvariable: Reopened this issue.
/lifecycle frozen
Hi everyone,
There has been additional discussions about this, and additional learnings for me since my last message.
In a nutshell:
A few question remain, though:
I've also been checking in and found that Active Directory SAMAccountName is restricted to 20 characters. It's not necessarily a blocker since SAMAccountName doesn't need to match the computer name, but it places constraints on uniqueness.
You're right that the hostname is a function of the provider, not CAPI.
Just adding some additional context to this, it seems that if your hostname is over 15 characters the $env:computername
variable cuts off at 15 characters, which I guess it because this is related to the GetComputerName API, this may cause problems for those people using Powershell to configure cni or something similar. hostname
however still gets the longer hostname.
Also, when using this with CAPV I have noticed that the identifiers at the end of the generated hostname are over 15 characters before you even add user-specified portion so that may need to be considered when running windows machine deployments.
Noted. thanks.
/area node-agent
@randomvariable: The label(s) area/node-agent
cannot be applied, because the repository doesn't have them
@perithompson : Thanks for mentioning this! IIRC, using [System.Net.Dns]::GetHostName() in powershell also returns the full name.
/retitle Windows Support: NetBIOS and Active Directory LDAP SAMAccountName restrictions on Hostname
Update on this: Regardless of NETBIOS, we will need hostname restricted because of the SAMAccountName, so have retitled the issue appropriately.
In terms of next steps:
Whether or not the machine, and concretely, cloud-init, ignition or whatever takes the hostname from the VM name is up to the cloud provider. It is the case for vSphere, Azure (maybe?), but not for AWS. AWS only uses the instance ID.
For AWS, this means if the machine name is shortened, this has no impact on the hostname unless the hostname is explicitly set in the userdata via cloud-init. However, we also would not want to default this because the Kubernetes AWS Cloud Provider (CPI not CAPA) requires the node name to match the host name which in turn MUST match the instance ID.
Next steps are to document:
@randomvariable, it may be worth noting that SAMAccountName is a name is used to support legacy versions of Windows (Windows NT4, Windows 95, Windows 98, ...: https://docs.microsoft.com/en-us/windows/win32/ad/naming-properties#samaccountname) I believe Windows 2000 and up don't require it.
The docs are referring to how SAMAccountName is consumed, as in it's typically consumed by legacy apps. However, it's still a mandatory field on the Computer LDAP schema, and from which the Computer name is derived - with no indication of being deprecated. SAMAccountName is also used during AD domain join, so it's the strongest of all of these requirements IMO.
I worked with AD and LDAP for years, and I can say that SAMAccountName
is still very much used by multiple applications, especially those that sync directory data into or out of AD. Windows is notorious for backwards compatibility.
Time for Andrew's pedantic point of the day -- technically SAMAccountName
is not part of Active Directory's LDAP schema's Computer
class. Rather, the Computer
class extends the User
class, where SAMAccountName
is marked as mandatory. Still, this has the same effect as @randomvariable illustrated above -- SAMAccountName
is required when creating an object from the Computer
class.
Absolutely, legacy apps and legacy Windows versions (Windows 9x, Windows NT) are limited by 15 characters names. That said, it is still possible to use >15 characters names with current versions of Windows (granted, this comes with a boat load of limitations). I think it makes sense to have some limitations for the Windows names for the many cases where the applications are limited to 15 characters, but should this limitation be optional for the cases where 15 characters is not a concern? Also, as I previously mentioned:
Hi @JocelynBerrendonner,
I have not been part of this thread from the beginning, so I apologize if this next set of questions have been asked and answered (I searched for the word hash
and did not see it):
SAMAccountName
need to be visually friendly?SAMAccountName
need to be the same as machine name?SAMAccountName
be derived from the machine name via some hashing function?@akutz : to be honest I am not a an LDAP/SAMAccountName expert, so I'd have to ask the experts to find the answer to your questions. That said, typically, when people want to make sure their long (>15 characters) names work well with legacy apps and make sure they are not colliding with anything on the network, they just make sure the first 15 characters are unique on the network. I've seen that approach work fairly well (with the caveats mentioned previously). So, may-be that's a clue in regards to SAMAccountName?
One way to do this would be to take the existing machine name, ex. workload-cluster-2-md-0-5f77f47487-2c4sq
and:
1fde0cd1
worklo2c4sq
-
character as a separator, ex. worklo2c4sq-1fde0cd1
The value worklo2c4sq-1fde0cd1
is exactly 20 characters long, and:
SAMAccountName
Heck, an even cheaper way to do this is just take the first 10 characters from the front of a machine name and last 10 characters from a machine name and use that as the SAMAccountName
, ex. workload-7487-2c4sq
.
I guess we need to check the Win32 Domain Join function and how it relates to machine name. I'm pretty sure SAMAccountName can be provided upon join, but not sure if the API then changes the hostname, which we'll then run into the cloud provider issues.
When adding into AD Group, the name from Get-WmiObject -Class Win32_ComputerSystem is used. And 15 characters is limited. If the length is greater than 15, only the first 15 characters is used. And the name could be different from hostname.
Hi @JocelynBerrendonner,
I create two windows workers which have the same begin 15 characters. Then I used Add-computer ps command to add them into the existed AD one by one. First, the 1st worker is added successfully and I could see it in AD server and the display name is truncated but the full fqdn name could display when I clicked it. The I add the 2nd worker. Also, the PS command return True. Then I checked the AD server. There is only one item in it. And the full fqdn name changed to the 2nd worker hostname. Then I re-try add-computer in the 1st worker and it tells me that " because it is already in that domain." But I cannot find the worker in AD server. Could you please help me on it? Thanks!
/kind document /assign @randomvariable /milestone v1.1
@vincepri: The label(s) kind/document
cannot be applied, because the repository doesn't have them.
cc @jayunit100
Yup! So, we'd love to propose a fix to this or go through the other folks proposed fixes in an upcoming capi meeting ?
@jayunit100 feel free to reach out if you have a solution in mind. We'll then request the change as required, whether that's some provider contract (which I suspect it might be) or otherwise
some of the stuff @jayunit100 and @weiwenli97 have been looking at is in https://docs.google.com/document/d/1C7PxLukDUyGxhgPxHRpGYPbROlZarak0QdE7grUPReQ/edit#
/help
@sbueringer: This request has been marked as needing help from a contributor.
Please ensure that the issue body includes answers to the following questions:
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
/unassign @randomvariable
/triage needs-information @CecileRobertMichon is this still a problem?
if you're asking if
Hostnames on windows are limited to 15 characters
is still true, then yes. I know some providers including CAPZ have implemented workaround to trim the AzureMachineName to use as hostname (https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/azure/scope/machine.go#L399). Not sure if this is something that can be fixed at the CAPI level. @marosset do you have any thoughts?
/triage accepted
User Story
As an operator, I would like to manage windows server worker nodes with the cluster api. Hostnames on windows are limited to 15 characters, and the hostnames that are set by the cluster api (by default in cloud-init metadata) exceed this limit. The cluster api should support a more flexible mechanism of setting hostnames so that shorter hostnames can be set for VMs.
Detailed Description
Netbios requires windows computer names to be 15 characters or fewer (https://support.microsoft.com/en-us/help/909264/naming-conventions-in-active-directory-for-computers-domains-sites-and). Attempting to set hostname with more than 15 characters on a windows machine will result in only the first 15 being used.
When using the machine deployment api object, the machine api object names are derived from the machineset controller (https://github.com/kubernetes-sigs/cluster-api/blob/7884484b621f13f604e74f60053f4214a2f19702/controllers/machineset_controller.go#L434). This name is later used to set the vm name (for example in CAPV - https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/895539d004ea33299435a2c739791e9800d0c2ae/controllers/vspheremachine_controller.go#L320), and then also as the
local-hostname
in the cloud-init metadata (https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/390c49a23e2b535a27b330e4983c59eb0b42f476/pkg/services/govmomi/service.go#L203).The machine api object names are prefixed by the name of the machine deployment api object. These names, for example, will be in the form:
where
workload-cluster-2-md-0
is the name of the machine deployment api object. The prefix is appended with 17 extra characters (-5f77f47487-2c4sq, -5f77f47487-25xhg), which will bring the total character count above 15. Notice that setting the deployment api object name to 3 or more characters will guarantee the same first 15 characters, and thus hostname collisions for the nodes. Being able to set the deployment api object name to something more meaningful than what could be expressed in 3 characters would be useful.My current workaround is to have cloudbase-init invoke an additional script before the join command that reforms the host name and sets it for the vm. This is somewhat undesirable as now the hostname and node api object name are not the same as the vm name. For consistency, it's desired (but not required) that the the vm name (as shown by the cloud provider), the machine api object name, and the node api object name are the same as the hostname of the vm.
Anything else you would like to add:
I realize that windows worker nodes are not officially supported by the cluster api, but I'm mentioning it since it's something that's up for discussion for the cluster-api roadmap (https://github.com/kubernetes-sigs/cluster-api/pull/2148/files#diff-767f66541aad47089dd5ded720dede6bR32).
Another workaround could be use to use the machine api object directly instead of the machine deployment api object, which would directly set the vm name based on the name of the machine api object. However, the benefits of using the machine deployment are lost.
/kind feature