siderolabs / talos-vmtoolsd

VMware tools implementation for the Talos Kubernetes platform, using govmomi and Talos' apid
Apache License 2.0
27 stars 12 forks source link

Unable to restart guest OS using vSphere #21

Open rgomezceis opened 3 weeks ago

rgomezceis commented 3 weeks ago

Using latest release..

First reboot works well, but when vm starts and I want to restart it again it fails with:

"Cannot complete operation because VMware Tools is not running in this virtual machine. Failed to reset the virtual machine: Cannot execute scripts."

Pod is running without any error log.

kube-system          talos-vmtoolsd-wmjjk                           1/1     Running
jonkerj commented 3 weeks ago

The error indicates that vSphere is not receiving communications from vmtoolsd. Could you check if the logs of the vmtoolsd (or post them here) show anything weird?

rgomezceis commented 3 weeks ago

Just this:

$ kubectl logs -n kube-system talos-vmtoolsd-rh4c7
{"level":"info","msg":"talos-vmtoolsd version latest\nCopyright 2020-2022 Oliver Kuckertz <oliver.kuckertz@mologie.de>\nThis program is free software and available under the Apache 2.0 license."}

Pod:

Containers:
  talos-vmtoolsd:
    Container ID:   containerd://1d1e8d606fdf6b56bf085decbbac5362c7f0690ec36f57fca8fc6dc84e996179
    Image:          ghcr.io/siderolabs/talos-vmtoolsd:latest
    Image ID:       ghcr.io/siderolabs/talos-vmtoolsd@sha256:8eefb326375abf45f07d5922e25701aa43bbf7aa50f86927a6d24633e44c3ca1
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Tue, 11 Jun 2024 20:33:43 +0000
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 11 Jun 2024 20:25:30 +0000
      Finished:     Tue, 11 Jun 2024 20:29:03 +0000
    Ready:          True
    Restart Count:  1
    Limits:
      cpu:     500m
      memory:  64Mi
    Requests:
      cpu:     500m
      memory:  8Mi
    Environment:
      TALOS_CONFIG_PATH:  /etc/talos-vmtoolsd/talosconfig
      TALOS_HOST:          (v1:status.hostIP)
    Mounts:
      /etc/talos-vmtoolsd from config (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  talos-vmtoolsd-config
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     ceis.node.workload:NoSchedule op=Exists
                 node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                 node-role.kubernetes.io/master:NoSchedule op=Exists
                 node.cloudprovider.kubernetes.io/uninitialized:NoSchedule op=Exists
                 node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists
                 node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                 node.kubernetes.io/unreachable:NoExecute op=Exists
                 node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:          <none>

image

jonkerj commented 3 weeks ago

Hmm that's not much. Could you add these env vars to your container spec, and inspect your logs again? According to main.go, this is how one sets the log level.

env:
- name: LOG_LEVEL
  value: debug  # or even trace
rgomezceis commented 3 weeks ago
{"level":"info","msg":"talos-vmtoolsd version latest\nCopyright 2020-2022 Oliver Kuckertz <oliver.kuckertz@mologie.de>\nThis program is free software and available under the Apache 2.0 license."}
{"level":"debug","module":"vmware-guestinfo","msg":"Opened channel 0"}
{"level":"debug","module":"vmware-guestinfo","msg":"Opened channel 1"}
{"handler_name":"reset","level":"debug","module":"nanotoolbox","msg":"incoming RPC request"}
{"level":"debug","module":"tboxcmds","msg":"sending hostname: ceis-worker-3"}
{"level":"debug","module":"tboxcmds","msg":"sending OS full name: Talos v1.7.4-cb3a8308"}
{"level":"debug","module":"tboxcmds","msg":"sending OS short name: Talos v1.7.4"}
{"level":"debug","module":"tboxcmds","msg":"GuestNicInfo: adding name=eth0 mac={mac} ip={ipv4}"}
{"level":"debug","module":"tboxcmds","msg":"GuestNicInfo: adding name=eth0 mac={mac }ip={ipv6}"}
{"handler_name":"Capabilities_Register","level":"debug","module":"nanotoolbox","msg":"incoming RPC request"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"handler_name":"Vix_1_Relayed_Command","level":"debug","module":"nanotoolbox","msg":"incoming RPC request"}
{"command":"vix","level":"debug","module":"nanotoolbox","msg":"sending tools state version=\"Talos v1.7.4-cb3a8308\" versionShort=\"Talos v1.7.4\" hostname=\"ceis-worker-3\""}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}
{"level":"debug","module":"vmware-guestinfo","msg":"No message to retrieve"}

We are using vSphere CPI, idk if it matters

robinelfrink commented 2 weeks ago

I'm seeing the same with talos-vmtoolsd as a system extension on vCloud. What's interersting, is that after the reboot talos-vmtoolsd properly reports IP addresses.

I'll try to find out what's going on here.

robinelfrink commented 1 week ago

Reboot works only after poweron; subsequent requests throw a java exception on the esx host, with this error in the UI:

      "message": "Cannot complete operation because VMware Tools is not running in this virtual machine.\nFailed to reset the virtual machine: Cannot execute scripts.",
      "faultMessage": [
        {
          "_type": "com.vmware.vim.binding.impl.vmodl.LocalizableMessageImpl",
          "key": "msg.vigor.reset.fail",
          "arg": [
            {
              "_type": "com.vmware.vim.binding.impl.vmodl.KeyAnyValueImpl",
              "key": "1",
              "value": "msg.foundryErrMsgId.VIX_E_POWEROP_SCRIPTS_NOT_AVAILABLE"
            }
          ],
          "message": "Failed to reset the virtual machine: Cannot execute scripts."
        }
      ]

I'll continue to try to figure out what's going on, but don't expect a fix very soon. If you're automating the reboots i suggest you try a shutdown followed by a powerup for now.

Here's the full response in the vSphere UI: stacktrace.json