Open rgomezceis opened 3 weeks ago
The error indicates that vSphere is not receiving communications from vmtoolsd. Could you check if the logs of the vmtoolsd (or post them here) show anything weird?
Just this:
$ kubectl logs -n kube-system talos-vmtoolsd-rh4c7
{"level":"info","msg":"talos-vmtoolsd version latest\nCopyright 2020-2022 Oliver Kuckertz <oliver.kuckertz@mologie.de>\nThis program is free software and available under the Apache 2.0 license."}
Pod:
Containers:
talos-vmtoolsd:
Container ID: containerd://1d1e8d606fdf6b56bf085decbbac5362c7f0690ec36f57fca8fc6dc84e996179
Image: ghcr.io/siderolabs/talos-vmtoolsd:latest
Image ID: ghcr.io/siderolabs/talos-vmtoolsd@sha256:8eefb326375abf45f07d5922e25701aa43bbf7aa50f86927a6d24633e44c3ca1
Port: <none>
Host Port: <none>
State: Running
Started: Tue, 11 Jun 2024 20:33:43 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 11 Jun 2024 20:25:30 +0000
Finished: Tue, 11 Jun 2024 20:29:03 +0000
Ready: True
Restart Count: 1
Limits:
cpu: 500m
memory: 64Mi
Requests:
cpu: 500m
memory: 8Mi
Environment:
TALOS_CONFIG_PATH: /etc/talos-vmtoolsd/talosconfig
TALOS_HOST: (v1:status.hostIP)
Mounts:
/etc/talos-vmtoolsd from config (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config:
Type: Secret (a volume populated by a Secret)
SecretName: talos-vmtoolsd-config
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: ceis.node.workload:NoSchedule op=Exists
node-role.kubernetes.io/control-plane:NoSchedule op=Exists
node-role.kubernetes.io/master:NoSchedule op=Exists
node.cloudprovider.kubernetes.io/uninitialized:NoSchedule op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events: <none>
Hmm that's not much. Could you add these env vars to your container spec, and inspect your logs again? According to main.go
, this is how one sets the log level.
env:
- name: LOG_LEVEL
value: debug # or even trace
{"level":"info","msg":"talos-vmtoolsd version latest\nCopyright 2020-2022 Oliver Kuckertz <oliver.kuckertz@mologie.de>\nThis program is free software and available under the Apache 2.0 license."}
{"level":"debug","module":"vmware-guestinfo","msg":"Opened channel 0"}
{"level":"debug","module":"vmware-guestinfo","msg":"Opened channel 1"}
{"handler_name":"reset","level":"debug","module":"nanotoolbox","msg":"incoming RPC request"}
{"level":"debug","module":"tboxcmds","msg":"sending hostname: ceis-worker-3"}
{"level":"debug","module":"tboxcmds","msg":"sending OS full name: Talos v1.7.4-cb3a8308"}
{"level":"debug","module":"tboxcmds","msg":"sending OS short name: Talos v1.7.4"}
{"level":"debug","module":"tboxcmds","msg":"GuestNicInfo: adding name=eth0 mac={mac} ip={ipv4}"}
{"level":"debug","module":"tboxcmds","msg":"GuestNicInfo: adding name=eth0 mac={mac }ip={ipv6}"}
{"handler_name":"Capabilities_Register","level":"debug","module":"nanotoolbox","msg":"incoming RPC request"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"handler_name":"Vix_1_Relayed_Command","level":"debug","module":"nanotoolbox","msg":"incoming RPC request"}
{"command":"vix","level":"debug","module":"nanotoolbox","msg":"sending tools state version=\"Talos v1.7.4-cb3a8308\" versionShort=\"Talos v1.7.4\" hostname=\"ceis-worker-3\""}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
We are using vSphere CPI, idk if it matters
I'm seeing the same with talos-vmtoolsd
as a system extension on vCloud. What's interersting, is that after the reboot talos-vmtoolsd
properly reports IP addresses.
I'll try to find out what's going on here.
Reboot works only after poweron; subsequent requests throw a java exception on the esx host, with this error in the UI:
"message": "Cannot complete operation because VMware Tools is not running in this virtual machine.\nFailed to reset the virtual machine: Cannot execute scripts.",
"faultMessage": [
{
"_type": "com.vmware.vim.binding.impl.vmodl.LocalizableMessageImpl",
"key": "msg.vigor.reset.fail",
"arg": [
{
"_type": "com.vmware.vim.binding.impl.vmodl.KeyAnyValueImpl",
"key": "1",
"value": "msg.foundryErrMsgId.VIX_E_POWEROP_SCRIPTS_NOT_AVAILABLE"
}
],
"message": "Failed to reset the virtual machine: Cannot execute scripts."
}
]
I'll continue to try to figure out what's going on, but don't expect a fix very soon. If you're automating the reboots i suggest you try a shutdown followed by a powerup for now.
Here's the full response in the vSphere UI: stacktrace.json
Using latest release..
First reboot works well, but when vm starts and I want to restart it again it fails with:
"Cannot complete operation because VMware Tools is not running in this virtual machine. Failed to reset the virtual machine: Cannot execute scripts."
Pod is running without any error log.