siderolabs / talos-vmtoolsd

VMware tools implementation for the Talos Kubernetes platform, using govmomi and Talos' apid
Apache License 2.0
28 stars 14 forks source link

Incompatible with Talos 1.5 and 1.6 due to resource API changes #9

Closed eugene-marchanka closed 9 months ago

eugene-marchanka commented 1 year ago

vmtools does not export node IP in Talos v1.5.1

$ kubectl --kubeconfig kubeconfig logs
 -n kube-system talos-vmtoolsd-f5bzt                                                                              
{"level":"info","msg":"talos-vmtoolsd version 0.3.1\nCopyright 2020-2022 Oliver Kuckertz <oliver.kuckertz@mologie.
de>\nThis program is free software and available under the Apache 2.0 license."}                                  
{"error":"rpc error: code = Unimplemented desc = unknown service resource.ResourceService","level":"error","module
":"talosapi","msg":"error receiving address status resource"}                                                     
{"error":"rpc error: code = Unimplemented desc = unknown service resource.ResourceService","level":"error","module
":"talosapi","msg":"error receiving address status resource"}                                                     
{"level":"warning","module":"tboxcmds","msg":"not sending primary IP: no interfaces received from upstream"}      
{"error":"rpc error: code = Unimplemented desc = unknown service resource.ResourceService","level":"error","module
":"talosapi","msg":"error receiving address status resource"}                                                     
{"error":"rpc error: code = Unimplemented desc = unknown service resource.ResourceService","level":"error","module
":"talosapi","msg":"error receiving address status resource"}                                                     
{"level":"warning","module":"tboxcmds","msg":"not sending primary IP: no interfaces received from upstream"}      
{"error":"rpc error: code = Unimplemented desc = unknown service resource.ResourceService","level":"error","module
":"talosapi","msg":"error receiving address status resource"}                                                     
{"error":"rpc error: code = Unimplemented desc = unknown service resource.ResourceService","level":"error","module
":"talosapi","msg":"error receiving address status resource"}                                                     
{"level":"warning","module":"tboxcmds","msg":"not sending primary IP: no interfaces received from upstream"}      
{"error":"rpc error: code = Unimplemented desc = unknown service resource.ResourceService","level":"error","module
":"talosapi","msg":"error receiving address status resource"}                                                     
{"error":"rpc error: code = Unimplemented desc = unknown service resource.ResourceService","level":"error","module
":"talosapi","msg":"error receiving address status resource"}                                                     
{"level":"warning","module":"tboxcmds","msg":"not sending primary IP: no interfaces received from upstream"}      
{"error":"rpc error: code = Unimplemented desc = unknown service resource.ResourceService","level":"error","module
":"talosapi","msg":"error receiving address status resource"}
mologie commented 1 year ago

Talos 1.5 removed an API for listing resources that talos-vmtoolsd relied on. I have a branch now where the new API is being used, but it is entirely untested at this point. It might not launch, crash on launch, or report wrong data:

To install, please use the "unstable" yaml ~but insert image reference ghcr.io/mologie/talos-vmtoolsd-unstable:talos-1.5~

Edit: Changes are in master, normal unstable image will suffice until 0.4 is released

eugene-marchanka commented 1 year ago

New image worked! image

mologie commented 1 year ago

Great, thanks for the info! I will leave this open for visibility for others, and close with an official stable release.

xerxist commented 1 year ago

Talos 1.5 removed an API for listing resources that talos-vmtoolsd relied on. I have a branch now where the new API is being used, but it is entirely untested at this point. It might not launch, crash on launch, or report wrong data:

To install, please use the "unstable" yaml but insert image reference ghcr.io/mologie/talos-vmtoolsd-unstable:talos-1.5

This works for me 2 on Talos 1.5.5 and kubernetes 1.28.4

Cheers!

CompPhy commented 11 months ago

Considering that Talos 1.6 just went GA last week, maybe it's time to get this merged???

I've been running 1.5.x for a while and started testing vsphere-csi-driver recently. This driver relies on cloud-provider-vsphere, which uses the nic/IP information to make sure it's matching up nodes correctly in vCenter. Don't ask me why it works this way, I just know that it's causing weird errors because the IP isn't populated in vCenter.

Basically, without working vmtools it can cause other problems downstream. I just spent several days pulling my hair out until I realized that this is a known, and fixed, issue.... Except you have to explicitly come here to find the fix.

If nothing else, maybe add a note on the README.md for anyone that is using the Talos 1.5 or 1.6 release???

CompPhy commented 11 months ago

FWIW, I can confirm this fix works on Talos 1.5.5; it also immediately fixed my issues with cloud-provider-vsphere as soon as the IP populated correctly in vCenter.

We also did update to 1.6.0 release and it looks like everything is working there as well.

jonkerj commented 10 months ago

@mologie, any news on this matter?

mologie commented 10 months ago

Hi folks, sorry for the late response and especially inconveniences caused. Family and life happened, and this is unfortunately a project that does not get much time allocated at work. If there is a commercial sponsor that would like to take over maintenance (like previously discussed on Talos Slack) that would be great for this project.

Given the Broadcom licensing changes at VMware, Authentic Vision will migrate away from VMware entirely this year, so I would have to make this a personal project of mine. I am not sure I am willing to commit to that since the customer base is entirely business customers. If anything, this would be a portfolio-project.

Regardless, I'll see to polish up, test, and get a release ready (including changes in preparation for making this a machine extension, but no extension available yet!) this week.

jonkerj commented 10 months ago

In the meanwhile, we (Equinix) have offered to adopt te project. While discussing whether to transfer the repo as a whole or change permissions, the idea came up to ask Sidero if this project could be hosted under their org. They were enthusiastic, and so that happened.

Equinix (primarily @lennardk, @robinelfrink and myself) will do most the actual work, though.

In the short term we are going to make the tool compatible with 1.6, and patch up @bnason's system extension (if needed). Our goal is to have the latter upstreamed at some point.

mologie commented 10 months ago

Thank you @jonkerj, @lennardk, and @robinelfrink for adopting this project. With 0.3.2 being ported to the Sidero group too everything and the work-in-progress stuff is available here, so it is time for me to step back and let you do your magic. I am of course available for questions and concerns regarding code and integration, but trust that the project is in good hands.

omniproc commented 9 months ago

any updates on this?

robinelfrink commented 9 months ago

Hi @omniproc,

Sorry for the delay. We have just released v0.4.0.