Closed mueller-tobias closed 1 year ago
Hi, what is your node template configuration? You can view it as a JSON by clicking on the View in API button - but please make sure not to include your API Token or username and password. Also, are you using any custom Engine Options? (such as a custom Docker install URL or a custom Storage Driver)
Note that currently ubuntu:22.04 image isn't supported because id_rsa ssh keys aren't supported for that OS
I used the default ubuntu:20.04.
FYI, thats the template i used:
{
"amazonec2Config": null,
"annotations": {
"ownerBindingsCreated": "true"
},
"baseType": "nodeTemplate",
"cloudCredentialId": null,
"created": "2022-12-14T13:37:41Z",
"createdTS": 1671025061000,
"creatorId": "user-8c6hp",
"driver": "ionoscloud",
"engineEnv": {},
"engineInstallURL": "https://releases.rancher.com/install-docker/20.10.sh",
"engineLabel": {},
"engineOpt": {},
"engineRegistryMirror": [],
"id": "cattle-global-nt:nt-g6qjp",
"ionoscloudConfig": {
"cores": "4",
"cpuFamily": "INTEL_SKYLAKE",
"datacenterId": "",
"diskSize": "50",
"diskType": "HDD",
"endpoint": "https://api.ionos.com/cloudapi/v6",
"image": "ubuntu:20.04",
"imagePassword": "abcde12345",
"lanId": "",
"location": "de/fra",
"password": "",
"ram": "2048",
"serverAvailabilityZone": "AUTO",
"sshUser": "root",
"token": "****",
"userData": "",
"userDataB64": "",
"username": "",
"volumeAvailabilityZone": "AUTO"
},
"labels": {
"cattle.io/creator": "norman"
},
"links": {
"nodePools": "…/v3/nodePools?nodeTemplateId=cattle-global-nt%3Ant-g6qjp",
"nodes": "…/v3/nodes?nodeTemplateId=cattle-global-nt%3Ant-g6qjp",
"self": "…/v3/nodeTemplates/cattle-global-nt:nt-g6qjp",
"update": "…/v3/nodeTemplates/cattle-global-nt:nt-g6qjp"
},
"logOpt": {},
"name": "Test without Datacenter",
"principalId": "local://user-8c6hp",
"state": "active",
"storageOpt": {},
"transitioning": "no",
"transitioningMessage": "",
"type": "nodeTemplate",
"useInternalIpAddress": true,
"uuid": "fc50f8f3-85fd-497f-a98e-c2c6fb139239"
}
Hi Tobias, can you tell where to get the debug output (i.e. the listing above)? Do I need to list the logs for a specific container in the RKE worker cluster? Or in the master cluster? Do you know what is currently happening when the failure occurs (i.e. in which part does the provisioning fail? Is the master trying to build an ssh tunnel to the worker cluster? Thanks
Ok - i see that its in the Provisioning log in the rancher GUI. At least in my case it seemed to work. If its any help, here is a snippet from my log: 1:59:28 pm | [INFO ] Initiating Kubernetes cluster 1:59:28 pm | [INFO ] [dialer] Setup tunnel for host [157.97.110.225] 1:59:37 pm | [INFO ] [state] Successfully started [cluster-state-deployer] container on host [157.97.110.225] 1:59:38 pm | [INFO ] Successfully Deployed state file at [management-state/rke/rke-315640628/cluster.rkestate] 1:59:38 pm | [INFO ] Building Kubernetes cluster 1:59:38 pm | [INFO ] [dialer] Setup tunnel for host [157.97.110.225] 1:59:38 pm | [INFO ] [network] Deploying port listener containers 1:59:39 pm | [INFO ] [network] Successfully started [rke-etcd-port-listener] container on host [157.97.110.225] 1:59:39 pm | [INFO ] [network] Successfully started [rke-cp-port-listener] container on host [157.97.110.225] 1:59:40 pm | [INFO ] [network] Successfully started [rke-worker-port-listener] container on host [157.97.110.225] ...
I have the rancher master running on a K3s cluster in one data center, from where I provisioned the RKE cluster in another data center (public LAN). could the error come from some proxy or security settings in the network where your rancher master is running? if you shell into the rancher master pod and try to ssh from there manually into the worker node, what happens then? Just throwing some ideas in hope that it helps...
hi martin, the rancher master on my tests was running on a k3s cluster and i tried to deploy the cluster in another datacenter. The Rancher is in a private lan with a nat gateway. I'll do some test if i can connect from the rancher vm to the downstream cluster vm via ssh.
For the ssh tests i tried to add my ssh key to the cloud-config. But when i add a cloud-config the driver has a problem to create the server. The cloud is a working one from a other cluster i used for some tests in out lab. The other cluster is also using a ubuntu-20.04 cloud image.
#cloud-config
ca-certs:
trusted:
- |
-----BEGIN CERTIFICATE-----
MIIBqTCCAU6gAwIBAgIRAIEufsGXTRyUC4tsIW398SMwCgYIKoZIzj0EAwIwMjET
MBEGA1UEChMKRGV2T3BzIExhYjEbMBkGA1UEAxMSRGV2T3BzIExhYiBSb290IENB
MB4XDTIyMDExNTA4MzAwMVoXDTMyMDExMzA4MzAwMVowMjETMBEGA1UEChMKRGV2
T3BzIExhYjEbMBkGA1UEAxMSRGV2T3BzIExhYiBSb290IENBMFkwEwYHKoZIzj0C
AQYIKoZIzj0DAQcDQgAE3CcGpgd5/jMDt42nOB98DVoppAdZ1vY0Us2WrtQ7nv5s
iZenDiImG9TdceR3P7a2wvnhUAmiBiZzT0yx/mlcwqNFMEMwDgYDVR0PAQH/BAQD
AgEGMBIGA1UdEwEB/wQIMAYBAf8CAQEwHQYDVR0OBBYEFFQ89/6jz4Qi4T59BHYC
qJljaNTqMAoGCCqGSM49BAMCA0kAMEYCIQDI5Zsng3vQTJQm3TiNtFClS+xcIIYz
BASuCGiG6LmZ7wIhAJXHwrPpXjEV8B4ML0QX3IwIh3cvA+iLXoHAtvolF5+0
-----END CERTIFICATE-----
groups:
- docker
manage_etc_hosts: true
runcmd:
- - sysctl
- '-p'
users:
- groups: 'docker, sudo'
name: ubuntu
ssh-authorized-keys:
- >
ssh-rsa
******
deployment-key
sudo:
- 'ALL=(ALL) NOPASSWD:ALL'
write_files:
- content: |
Acquire::ForceIPv4 "true";
path: /etc/apt/apt.conf.d/99disable-ipv6
- content: |
Acquire::ForceIPv4 "true";
path: /etc/apt/apt.conf.d/99disable-ipv6
- content: |
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
path: /etc/sysctl.d/99-sysctl.conf
packageUpdate: true
packages:
- nfs-common
This it the log output from the rancher container:
2022/12/21 06:47:01 [INFO] [node-controller] Provisioning node ionos-k8s1
2022/12/21 06:47:01 [INFO] [node-controller] Creating CA: /management-state/node/nodes/ionos-k8s1/certs/ca.pem
2022/12/21 06:47:02 [INFO] [node-controller] Creating client certificate: /management-state/node/nodes/ionos-k8s1/certs/cert.pem
2022/12/21 06:47:02 [INFO] [node-controller] Running pre-create checks...
2022/12/21 06:47:02 [INFO] [node-controller] (ionos-k8s1) IONOS Cloud Driver Version: 6.1.0-rc.1
2022/12/21 06:47:02 [INFO] [node-controller] (ionos-k8s1) SDK-GO Version: 6.1.3
2022/12/21 06:47:02 [INFO] [node-controller] Creating machine...
2022/12/21 06:47:03 [INFO] [node-controller] (ionos-k8s1) Creating SSH key...
2022/12/21 06:47:03 [INFO] [node-controller] (ionos-k8s1) Using user data: users:
2022/12/21 06:47:03 [INFO] [node-controller] (ionos-k8s1) - groups: 'docker, sudo'
2022/12/21 06:47:03 [INFO] [node-controller] (ionos-k8s1) name: tobias
2022/12/21 06:47:03 [INFO] [node-controller] (ionos-k8s1) ssh-authorized-keys:
2022/12/21 06:47:03 [INFO] [node-controller] (ionos-k8s1) - >
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) ssh-rsa
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) ******
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) deployment-key
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) sudo:
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) - 'ALL=(ALL) NOPASSWD:ALL'
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) write_files:
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) - content: |
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) Acquire::ForceIPv4 "true";
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) path: /etc/apt/apt.conf.d/99disable-ipv6
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) - content: |
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) Acquire::ForceIPv4 "true";
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) path: /etc/apt/apt.conf.d/99disable-ipv6
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) - content: |
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) net.ipv6.conf.all.disable_ipv6 = 1
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) net.ipv6.conf.default.disable_ipv6 = 1
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) net.ipv6.conf.lo.disable_ipv6 = 1
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) path: /etc/sysctl.d/99-sysctl.conf
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) packageUpdate: true
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) packages:
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) - nfs-common
2022/12/21 06:47:04 [INFO] [node-controller] (ionos-k8s1) DataCenter Created
2022/12/21 06:47:14 [INFO] [node-controller] (ionos-k8s1) LAN Created
2022/12/21 06:47:22 [INFO] [node-controller] (ionos-k8s2) NIC Deleted
2022/12/21 06:47:24 [INFO] [node-controller] (ionos-k8s1) Server Created
2022/12/21 06:47:35 [INFO] [node-controller] (ionos-k8s1) Image Alias: ubuntu:20.04
2022/12/21 06:47:35 [INFO] [node-controller] (ionos-k8s1) WARNING: Error creating machine. Rolling back...
2022/12/21 06:47:35 [INFO] [node-controller] (ionos-k8s1) NOTICE: Please check IONOS Cloud Console/CLI to ensure there are no leftover resources.
2022/12/21 06:47:35 [INFO] [node-controller] (ionos-k8s1) Starting deleting resources...
2022/12/21 06:47:42 [INFO] [node-controller] (ionos-k8s2) Volume Deleted
2022/12/21 06:47:45 [INFO] [node-controller] (ionos-k8s1) Server Deleted
2022/12/21 06:47:53 [INFO] [node-controller] (ionos-k8s2) Server Deleted
2022/12/21 06:47:56 [INFO] [node-controller] (ionos-k8s1) LAN Deleted
2022/12/21 06:48:03 [INFO] [node-controller] (ionos-k8s2) LAN Deleted
2022/12/21 06:48:06 [INFO] [node-controller] (ionos-k8s1) DataCenter Deleted
I used the ssh keys created from the driver to connect to the new vm. The connection with the ssh keys are not the problem. Not from the rancher pod nor from my laptop. Docker for example is installed via ssh when the cloud-init is done. Here's the output from docker info
from the node created from the driver:
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
compose: Docker Compose (Docker Inc., v2.14.1)
scan: Docker Scan (Docker Inc., v0.23.0)
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 20.10.21
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9ba4b250366a5ddde94bb7c9d1def331423aa323
runc version: v1.1.4-0-g5fd4c4d
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.4.0-125-generic
Operating System: Ubuntu 20.04.4 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 3.81GiB
Name: ionos-k8s2
ID: SLXH:VGFI:Z2OJ:4GVA:6UN4:NHDX:XG22:AMMH:MO74:EDMQ:CVED:IVCH
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
I had exactly the same problem with my cloudconfig/userdata (error creating machine...). This is apparently already fixed in the main branch, but @avirtopeanu-ionos - I think we need a new RC to get further.
Hi, https://github.com/ionos-cloud/docker-machine-driver/releases/tag/v6.1.0-rc.2 this should fix the issue with the cloudconfig / userdata.
Hi Tobias, did you try the new release candinate? Did it fix the issue? Thanks, Martin
I was on vacation over the holidays. I'll do some tests later today or tomorrow.
Last week i was out of order and couldn't do the tests as planned. I'll take a look at the bug and the fixes in rc2 today and tomorrow.
Bug is fixed with v6.1.0-rc.2. I could successfully create a working kubernetes cluster.
Description
Trying to create a Kubernetes Cluster in a Public LAN with Default Settings (SSH User = root) and no Cloud-init failed. The datacenter and the vm is created successfully without error. The VM has a public vm and the ssh port is open. When Rancher tries to connect to the docker daemon the workflows hangs with the errors below.
Expected behavior
Create a working single node kubernetes cluster
Environment
Rancher Version:
Docker Machine Driver Ionos Cloud version:
How to Reproduce
Create a Node Template with no DataCenter ID and no LAN ID and the default settings for a datacenter in frankfurt.
Error and Debug Output