Closed lukababu closed 7 months ago
I was able to get 4/5 nodes working after "Delete nodes and do a reinstall, re-clone branch and setup fresh". A possible culprit might have been resizing the partition on the nodes, or installing net-tools package.
Checking the log output the following lines appear right after:
2024-03-03 17:01:06,965 p=79858 u=luke n=ansible | Escalation succeeded
2024-03-03 17:01:06,965 p=79858 u=luke n=ansible | Escalation succeeded
2024-03-03 17:02:49,568 p=79858 u=luke n=ansible | <10.1.1.84> (0, b'\r\n{"name": "k3s-node", "changed": true, "status": {"Type": "notify", "Restart": "always", "NotifyAccess": "main", "RestartUSec": "5s", "TimeoutStartUSec": "infinity", "TimeoutStopUSec": "1min 30s", "TimeoutAbortUSec": "1min 30s", "TimeoutStartFailureMode": "terminate", "TimeoutStopFailureMode": "terminate", "RuntimeMaxUSec": "infinity", "WatchdogUSec": "infinity", "WatchdogTimestamp": "n/a", "WatchdogTimestampMonotonic": "0", "RootDirectoryStartOnly": "no", "RemainAfterExit": "no", "GuessMainPID": "yes", "MainPID": "0", "ControlPID": "0", "FileDescriptorStoreMax": "0", "NFileDescriptorStore": "0", "StatusErrno": "0", "Result": "success", "ReloadResult": "success", "CleanResult": "success", "UID": "[not set]", "GID": "[not set]", "NRestarts": "0", "OOMPolicy": "continue", "ExecMainStartTimestamp": "n/a", "ExecMainStartTimestampMonotonic": "0", "ExecMainExitTimestamp": "n/a", "ExecMainExitTimestampMonotonic": "0", "ExecMainPID": "0", "ExecMainCode": "0", "ExecMainStatus": "0", "ExecStartPre": "{ path=/sbin/modprobe ; argv[]=/sbin/modprobe overlay ; ignore_errors=yes ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }", "ExecStartPreEx": "{ path=/sbin/modprobe ; argv[]=/sbin/modprobe overlay ; flags=ignore-failure ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }", "ExecStart": "{ path=/usr/local/bin/k3s ; argv[]=/usr/local/bin/k3s agent --server https://10.1.1.222:6443 --token K10c713ed47480e52638ab99b6acd3f4d1f22f3ed66f65355ed09cc75deb347d5ee::server:unoEjdr9aXGRhWVczytGUKjyEF3tFDYz --flannel-iface=eth0 --node-ip=10.1.1.84 ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }", "ExecStartEx": "{ path=/usr/local/bin/k3s ; argv[]=/usr/local/bin/k3s agent --server https://10.1.1.222:6443 --token K10c713ed47480e52638ab99b6acd3f4d1f22f3ed66f65355ed09cc75deb347d5ee::server:unoEjdr9aXGRhWVczytGUKjyEF3tFDYz --flannel-iface=eth0 --node-ip=10.1.1.84 ; flags= ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }", "Slice": "system.slice", "MemoryCurrent": "[not set]", "MemoryAvailable": "infinity", "CPUUsageNSec": "[not set]", "TasksCurrent": "[not set]", "IPIngressBytes": "[no data]", "IPIngressPackets": "[no data]", "IPEgressBytes": "[no data]", "IPEgressPackets": "[no data]", "IOReadBytes": "18446744073709551615", "IOReadOperations": "18446744073709551615", "IOWriteBytes": "18446744073709551615", "IOWriteOperations": "18446744073709551615", "Delegate": "yes", "DelegateControllers": "cpu cpuacct cpuset io blkio memory devices pids bpf-firewall bpf-devices bpf-foreign bpf-socket-bind", "CPUAccounting": "yes", "CPUWeight": "[not set]", "StartupCPUWeight": "[not set]", "CPUShares": "[not set]", "StartupCPUShares": "[not set]", "CPUQuotaPerSecUSec": "infinity", "CPUQuotaPeriodUSec": "infinity", "IOAccounting": "no", "IOWeight": "[not set]", "StartupIOWeight": "[not set]", "BlockIOAccounting": "no", "BlockIOWeight": "[not set]", "StartupBlockIOWeight": "[not set]", "MemoryAccounting": "yes", "DefaultMemoryLow": "0", "DefaultMemoryMin": "0", "MemoryMin": "0", "MemoryLow": "0", "MemoryHigh": "infinity", "MemoryMax": "infinity", "MemorySwapMax": "infinity", "MemoryLimit": "infinity", "DevicePolicy": "auto", "TasksAccounting": "yes", "TasksMax": "infinity", "IPAccounting": "no", "ManagedOOMSwap": "auto", "ManagedOOMMemoryPressure": "auto", "ManagedOOMMemoryPressureLimit": "0", "ManagedOOMPreference": "none", "UMask": "0022", "LimitCPU": "infinity", "LimitCPUSoft": "infinity", "LimitFSIZE": "infinity", "LimitFSIZESoft": "infinity", "LimitDATA": "infinity", "LimitDATASoft": "infinity", "LimitSTACK": "infinity", "LimitSTACKSoft": "8388608", "LimitCORE": "infinity", "LimitCORESoft": "infinity", "LimitRSS": "infinity", "LimitRSSSoft": "infinity", "LimitNOFILE": "1048576", "LimitNOFILESoft": "1048576", "LimitAS": "infinity", "LimitASSoft": "infinity", "LimitNPROC": "infinity", "LimitNPROCSoft": "infinity", "LimitMEMLOCK": "65536", "LimitMEMLOCKSoft": "65536", "LimitLOCKS": "infinity", "LimitLOCKSSoft": "infinity", "LimitSIGPENDING": "15347", "LimitSIGPENDINGSoft": "15347", "LimitMSGQUEUE": "819200", "LimitMSGQUEUESoft": "819200", "LimitNICE": "0", "LimitNICESoft": "0", "LimitRTPRIO": "0", "LimitRTPRIOSoft": "0", "LimitRTTIME": "infinity", "LimitRTTIMESoft": "infinity", "OOMScoreAdjust": "0", "CoredumpFilter": "0x33", "Nice": "0", "IOSchedulingClass": "2", "IOSchedulingPriority": "4", "CPUSchedulingPolicy": "0", "CPUSchedulingPriority": "0", "CPUAffinityFromNUMA": "no", "NUMAPolicy": "n/a", "TimerSlackNSec": "50000", "CPUSchedulingResetOnFork": "no", "NonBlocking": "no", "StandardInput": "null", "StandardOutput": "journal", "StandardError": "inherit", "TTYReset": "no", "TTYVHangup": "no", "TTYVTDisallocate": "no", "SyslogPriority": "30", "SyslogLevelPrefix": "yes", "SyslogLevel": "6", "SyslogFacility": "3", "LogLevelMax": "-1", "LogRateLimitIntervalUSec": "0", "LogRateLimitBurst": "0", "SecureBits": "0", "CapabilityBoundingSet": "cap_chown cap_dac_override cap_dac_read_search cap_fowner cap_fsetid cap_kill cap_setgid cap_setuid cap_setpcap cap_linux_immutable cap_net_bind_service cap_net_broadcast cap_net_admin cap_net_raw cap_ipc_lock cap_ipc_owner cap_sys_module cap_sys_rawio cap_sys_chroot cap_sys_ptrace cap_sys_pacct cap_sys_admin cap_sys_boot cap_sys_nice cap_sys_resource cap_sys_time cap_sys_tty_config cap_mknod cap_lease cap_audit_write cap_audit_control cap_setfcap cap_mac_override cap_mac_admin cap_syslog cap_wake_alarm cap_block_suspend cap_audit_read cap_perfmon cap_bpf cap_checkpoint_restore", "DynamicUser": "no", "RemoveIPC": "no", "PrivateTmp": "no", "PrivateDevices": "no", "ProtectClock": "no", "ProtectKernelTunables": "no", "ProtectKernelModules": "no", "ProtectKernelLogs": "no", "ProtectControlGroups": "no", "PrivateNetwork": "no", "PrivateUsers": "no", "PrivateMounts": "no", "PrivateIPC": "no", "ProtectHome": "no", "ProtectSystem": "no", "SameProcessGroup": "no", "UtmpMode": "init", "IgnoreSIGPIPE": "yes", "NoNewPrivileges": "no", "SystemCallErrorNumber": "2147483646", "LockPersonality": "no", "RuntimeDirectoryPreserve": "no", "RuntimeDirectoryMode": "0755", "StateDirectoryMode": "0755", "CacheDirectoryMode": "0755", "LogsDirectoryMode": "0755", "ConfigurationDirectoryMode": "0755", "TimeoutCleanUSec": "infinity", "MemoryDenyWriteExecute": "no", "RestrictRealtime": "no", "RestrictSUIDSGID": "no", "RestrictNamespaces": "no", "MountAPIVFS": "no", "KeyringMode": "private", "ProtectProc": "default", "ProcSubset": "all", "ProtectHostname": "no", "KillMode": "process", "KillSignal": "15", "RestartKillSignal": "15", "FinalKillSignal": "9", "SendSIGKILL": "yes", "SendSIGHUP": "no", "WatchdogSignal": "6", "Id": "k3s-node.service", "Names": "k3s-node.service", "Requires": "sysinit.target system.slice", "Conflicts": "shutdown.target", "Before": "shutdown.target", "After": "network-online.target basic.target sysinit.target systemd-journald.socket system.slice", "Documentation": "https://k3s.io", "Description": "Lightweight Kubernetes", "LoadState": "loaded", "ActiveState": "inactive", "FreezerState": "running", "SubState": "dead", "FragmentPath": "/etc/systemd/system/k3s-node.service", "UnitFileState": "disabled", "UnitFilePreset": "enabled", "StateChangeTimestamp": "n/a", "StateChangeTimestampMonotonic": "0", "InactiveExitTimestamp": "n/a", "InactiveExitTimestampMonotonic": "0", "ActiveEnterTimestamp": "n/a", "ActiveEnterTimestampMonotonic": "0", "ActiveExitTimestamp": "n/a", "ActiveExitTimestampMonotonic": "0", "InactiveEnterTimestamp": "n/a", "InactiveEnterTimestampMonotonic": "0", "CanStart": "yes", "CanStop": "yes", "CanReload": "no", "CanIsolate": "no", "CanFreeze": "yes", "StopWhenUnneeded": "no", "RefuseManualStart": "no", "RefuseManualStop": "no", "AllowIsolate": "no", "DefaultDependencies": "yes", "OnSuccessJobMode": "fail", "OnFailureJobMode": "replace", "IgnoreOnIsolate": "no", "NeedDaemonReload": "no", "JobTimeoutUSec": "infinity", "JobRunningTimeoutUSec": "infinity", "JobTimeoutAction": "none", "ConditionResult": "no", "AssertResult": "no", "ConditionTimestamp": "n/a", "ConditionTimestampMonotonic": "0", "AssertTimestamp": "n/a", "AssertTimestampMonotonic": "0", "Transient": "no", "Perpetual": "no", "StartLimitIntervalUSec": "10s", "StartLimitBurst": "5", "StartLimitAction": "none", "FailureAction": "none", "SuccessAction": "none", "CollectMode": "inactive"}, "enabled": true, "state": "started", "invocation": {"module_args": {"name": "k3s-node", "daemon_reload": true, "state": "restarted", "enabled": true, "daemon_reexec": false, "scope": "system", "no_block": false, "force": null, "masked": null}}}\r\n', b"OpenSSH_9.0p1, LibreSSL 3.3.6\r\ndebug1: Reading configuration data /Users/luke/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files\r\ndebug1: /etc/ssh/ssh_config line 54: Applying options for *\r\ndebug2: resolve_canonicalize: hostname 10.1.1.84 is address\r\ndebug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/Users/luke/.ssh/known_hosts'\r\ndebug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/Users/luke/.ssh/known_hosts2'\r\ndebug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 81724\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\nShared connection to 10.1.1.84 closed.\r\n")
2024-03-03 17:02:49,572 p=79858 u=luke n=ansible | <10.1.1.84> ESTABLISH SSH CONNECTION FOR USER: serveradmin
2024-03-03 17:02:49,574 p=79858 u=luke n=ansible | <10.1.1.84> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="serveradmin"' -o ConnectTimeout=10 -o 'ControlPath="/Users/luke/.ansible/cp/ff535aba42"' 10.1.1.84 '/bin/sh -c '"'"'rm -f -r /home/serveradmin/.ansible/tmp/ansible-tmp-1709510466.4454558-81800-273274775614732/ > /dev/null 2>&1 && sleep 0'"'"''
2024-03-03 17:02:49,667 p=79858 u=luke n=ansible | <10.1.1.84> (0, b'', b"OpenSSH_9.0p1, LibreSSL 3.3.6\r\ndebug1: Reading configuration data /Users/luke/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files\r\ndebug1: /etc/ssh/ssh_config line 54: Applying options for *\r\ndebug2: resolve_canonicalize: hostname 10.1.1.84 is address\r\ndebug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/Users/luke/.ssh/known_hosts'\r\ndebug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/Users/luke/.ssh/known_hosts2'\r\ndebug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 81724\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 2\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n")
2024-03-03 17:02:49,681 p=79858 u=luke n=ansible | changed: [10.1.1.84] => {
"changed": true,
Let me know if I should attach the full log, in case that is helpful!
Not able to install the playbook
Expected Behavior
Installation to be successful
Current Behavior
It should not hang at:
and instead, it should tell me what the issue is at the very least.
The log output where it hangs give the following output:
Steps to Reproduce
ansible-playbook ./site.yml -i ./inventory/my-cluster/hosts.ini -bK
ansible-playbook ./reset.yml -i ./inventory/my-cluster/hosts.ini
ansible-playbook ./reboot.yml -i ./inventory/my-cluster/hosts.ini
ansible-playbook ./site.yml -i ./inventory/my-cluster/hosts.ini -bK
, same error occursContext (variables)
Running playbook from:
Operating system: Ventura 13.5.1 (22G90) Hardware: Macbook Pro M2 Max
Deploying in:
Operating system: Proxmox cloud-based Ubuntu VM clones 22.04 amd64 (Jammy) Hardware: 4 Core, 4 GB RAM, 64 GB NVMe Drive.
Variables Used
all.yml
Possible Solution