kinvolk / racker

rack provisioning utility for Kinvolk projects
Apache License 2.0
14 stars 3 forks source link

dnsmasq config seems to be missing #96

Open TheApeMachine opened 3 years ago

TheApeMachine commented 3 years ago

Hi,

Pretty sure it's me not getting something (possibly simple), but I have been trying for a few days now. I get this error in the screenshot (sorry for screenshot, for some reason I can't get my ssh key to work either on the management node's PXE boot).

I am guessing it is telling me the IP of the BMC should be "" (empty) because the dnsmasq config is missing? Basically when I do racker bootstrap, it just ends after the wizard. No message, just quit. racker status --full shows this. It sayd it can reach the BMC though.

Should you be wondering... Yes I only have one additional node configured in nodes.csv at the moment (I was doing this at night and this particular Dell server is extremely noisy because of a fan issue). I would expect it to just provision a control node, or at least fail at some other stage?

racker1

pothos commented 3 years ago

You need at least two nodes to provision a K8s cluster, one controller and one worker. With only one node you can set up plain Flatcar, e.g., you can also try to do a quick run with racker bootstrap -onfailure exclude -provision flatcar that skips the wizard. Expected is some output like printed in section 3.1.2. Bootstrap Stages of the manual. If the provisioning failed you can try to find logs in the subfolder. If it didn't actually start I would check what racker factory check tells, and make sure to follow the manual for setting up the management node and the rack metadata.

TheApeMachine commented 3 years ago

I tried provisioning flatcar too exactly as you say. Racker factory check says all is well, however, could this have something to do with the nodes not being setup with uefi bios? I just discovered it, so I am re-configuring and trying again.

pothos commented 3 years ago

Can you paste the output of racker bootstrap ... and racker upgrade?

TheApeMachine commented 3 years ago

(Small edit: This is the same as well using the core user that is auto logged in (Edit 2: Sorry, no it is not, see screenshot)).

(Not sure if you meant actual elipses or not, but this is the same experience no matter what you so, no params for racker bootstrap or -onfailure exclude -provision flatcar.

theapemachine@localhost /usr/share/oem $ cat nodes.csv     
Primary MAC Address, BMC MAC Address, SEcondary MAC Address, Node Type, Comments
bc:30:5b:d2:7c:da, bc:30:5b:d2:7c:e2, bc:30:5b:d2:7c:dc, small, mgmt
18:03:73:ff:86:7a, 18:03:73:ff:55:b4, 18:03:73:ff:86:7b, large, controller 
theapemachine@localhost /usr/share/oem $ racker factory check
✓ /usr/share/oem/ipmi_user and /usr/share/oem/ipmi_password look valid
✓ /usr/share/oem/nodes.csv looks valid
🛈 1 node types configured:
  1 "large"
theapemachine@localhost /usr/share/oem $ racker bootstrap ...
? Choose what to provision Flatcar Container Linux
? Choose how you want to assign the IP addresses Use DHCP
? Choose a subnet prefix for the rack-internal network, only change this if the default clashes with the external network 172.24.213
theapemachine@localhost /usr/share/oem $ racker upgrade
Getting quay.io/kinvolk/racker:latest
latest: Pulling from kinvolk/racker
Digest: sha256:a688cb04e2ecb7c2941dc28f9a28f5bb5a3bd6e5a16958a27d5c5fdf3f4419cc
Status: Image is up to date for quay.io/kinvolk/racker:latest
quay.io/kinvolk/racker:latest
Running quay.io/kinvolk/racker:latest
Installation complete, you may now run: racker
Getting quay.io/kinvolk/racker:0.3
0.3: Pulling from kinvolk/racker
Digest: sha256:a688cb04e2ecb7c2941dc28f9a28f5bb5a3bd6e5a16958a27d5c5fdf3f4419cc
Status: Image is up to date for quay.io/kinvolk/racker:0.3
quay.io/kinvolk/racker:0.3
theapemachine@localhost /usr/share/oem $

Hmm, but the error in status is new...

theapemachine@localhost /usr/share/oem $ racker status --full
Provisioned: Flatcar

MAC address        BMC reached  Power   OS provisioned  Joined cluster   Hostnames
18:03:73:ff:86:7a   ×           ×       ○   

To see details for a node, run "ipmi <MAC|DOMAIN> diag" or rerun this command with the parameter "--full" to see the details of all nodes.

docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/create: dial unix /var/run/docker.sock: connect: permission denied.
See 'docker run --help'.

Before I was only consistently getting that dnsmassq config error, and dnsmasq service was also dead (which it still is now too).

Edit2: With core user I still get the same dnsmasq error. However, I did just notice that my router is configured to provide static IPs to only the BMC cards of the servers here, and I seem to remember explicit notice in the documentation that it needs to be DHCP, I reconfigured.

racker2

pothos commented 3 years ago

I guess you hit the corner case which we didn't test, can you add one more node to the nodes.csv file, even if it's not reachable? For Lokomotive we should print an error for only one node, for plain Flatcar it should work.

TheApeMachine commented 3 years ago

That seems to work... Looks like it's installing Lokomotive, if it does, this would be a very long day in the making :p

racker3

TheApeMachine commented 3 years ago

It failed on the BMC connectivity check though, but I could have easily gotten the numbers wrong, so I will check these again. Or actually, likely they are still configured static with that ipmi tool, will check.

TheApeMachine commented 3 years ago

Maybe I could bother you with one more question. After that, I think we can close this ticket, as the original intent was answered by user error. I believe something along the following lines happened: I added the second server (a real one) and things looked to start working. However I had my BMCs still at ipsrc static for easy access previously, so mgmt node failed to set the DHCP. mgmt node now has second NIC configured as 172.x.x.x while rest of the network is 192.168.1.x and I guess this is why subsequent attempts are failing?

I'd be happy to document the experience once I have it working if you are interested? I think it could be useful for people like me, who are really at the beginning of starting to understand all of this (especially the networking, there's a lot to keep track of there). I've been eyeing Flatcar and Lokomotive for about three months now :p (though Flatcar I got working in the end).

pothos commented 3 years ago

We can leave it open until the improvements are implemented.

Yes, DHCP is required for the internal network, the docs have this here to switch to DHCP from static addressing:

If IPMI static IP addressing was manually configured on the
BMCs you have to switch the BMCs back to DHCP (either manually or by
switching to the same subnet with racker bootstrap … -subnet-prefix a.b.c and
then running ipmi --all lan set 1 ipsrc dhcp).

Would be good it the docs state the DHCP requirement explicitly earlier and this step in 2.3.4. Verifying the rack metadata gets split out for visibility.