tinkerbell / playground

Example deployments of the Tinkerbell Stack for use as playground environments
Apache License 2.0
126 stars 85 forks source link

STATE_FAILED with docker-compose sandbox #177

Closed Echok3 closed 8 months ago

Echok3 commented 8 months ago

I prepared two Dell R330 servers, one for TINKERBELL HOST, another for TINKERBELL CLIENT, and two servers eth0 connected to the same switch (Cisco 9200L -48T). Both two servers have Internet access through the switch.

I tried less than 10 times, but TINKERBELL_CLIENT always stuck in the LinuxKid screen, on the TINKERBELL_HOST terminal showed sandbox-workflow ubuntu-focal STATE_FAILED

Please I need some help with debugging.

Environment

Cisco 9200L -48T Switch

TINKERBELL HOST OS: Ubuntu 22.04.3 LTS (16G RAM, 2T HDD disk)

TINKERBELL CLIENT : (8G RAM, 3T HDD disk)

I follow the steps in the docker-compose sandbox, https://github.com/tinkerbell/sandbox/blob/main/docs/quickstarts/COMPOSE.md

This is my .evn file below:

# Can be set to your own hook builds
vOSIE=v0.8.0
OSIE_DOWNLOAD_URLS=https://github.com/tinkerbell/hook/releases/download/${vOSIE}/hook_x86_64.tar.gz,https://github.com/tinkerbell/hook/releases/download/${vOSIE}/hook_aarch64.tar.gz

# This is the IP and MAC of the machine to be provisioned
# The IP should normally be in the same network as the IP used for the provisioner
TINKERBELL_CLIENT_IP=192.168.0.14
TINKERBELL_CLIENT_MAC=34:17:eb:ee:fd:fb

# These are the Gateway and DNS addresses the client should use, required for tink-worker to pull action images
TINKERBELL_CLIENT_GW=192.168.0.1
TINKERBELL_CLIENT_NAMESERVER_1=1.1.1.1
TINKERBELL_CLIENT_NAMESERVER_2=8.8.8.8

# This should be an IP that's on an interface where you will be provisioning machines
TINKERBELL_HOST_IP=192.168.0.13

# Images used by docker compose natively or in terraform/vagrant, update if necessary
BOOTS_IMAGE=quay.io/tinkerbell/boots:v0.7.0
HEGEL_IMAGE=quay.io/tinkerbell/hegel:v0.8.0
TINK_VERSION=v0.8.0
TINK_SERVER_IMAGE=quay.io/tinkerbell/tink:${TINK_VERSION}
TINK_CONTROLLER_IMAGE=quay.io/tinkerbell/tink-controller:${TINK_VERSION}
TINK_WORKER_IMAGE=quay.io/tinkerbell/tink-worker:${TINK_VERSION}
RUFIO_VERSION=v0.1.0
RUFIO_IMAGE=quay.io/tinkerbell/rufio:${RUFIO_VERSION}
K3S_IMAGE=rancher/k3s:v1.24.4-k3s1

# This is the boot/primary disk device and the device for its first partition 
# for the machine to be provisioned (as it would appear with lsblk)
DISK_DEVICE=/dev/sda
DISK_DEVICE_PARTITION_1=/dev/sda1
# Example for a device with an NVME SSD
#DISK_DEVICE=/dev/nvme0n1
#DISK_DEVICE_PARTITION_1=/dev/nvme0n1p1

Current Behaviour

  1. When Starting the provisioner on TINKERBELL HOST, there is no error output.

  2. When continuing step 5, use the command below on TINKERBELL HOST:

    KUBECONFIG=./state/kube/kubeconfig.yaml kubectl get -n default workflow sandbox-workflow --watch

    Frist time it showed: sandbox-workflow ubuntu-focal STATE_PENDING

    Then I power on the TINKERBELL CLIENT Server, TINKERBELL CLIENT can normal booting up through ipxe.

    And TINKERBELL HOST terminal showed like that:

    The TINKERBELL CLIENT screen of show like that:

It is always stuck on this screen. When I reboot the TINKERBELL CLIENT, It seems to installed the Ubuntu successfully, but can't log in with the username tink and password tink, also I can't ping successfully from TINKERBELL HOST.

Possible Solution

howard-yeh commented 8 months ago

What's your tinkerbell client output with command " cat /proc/cmdline"?

Echok3 commented 8 months ago

What's your tinkerbell client output with command " cat /proc/cmdline"?

tinkerbell_tls=false, Is that the problem?

chrisdoherty4 commented 8 months ago

The maintainers don't test a whole lot with the compose setup; typically we use k3d to create a light weight Kubernetes cluster and deploy the Tinkerbell stack using Helm.

Support for compose is falling rapidly by the wayside and will probably be removed soon so I suggest you change over. If you still get problems feel free to create a new issue.