redhat-performance / JetSki

Deployment Automation for OpenShift on Baremetal in Red Hat's Shared Labs
Apache License 2.0
21 stars 29 forks source link

upstream rebase and fixed hostname issue #303

Open mukrishn opened 6 months ago

mukrishn commented 6 months ago

Description

Rebased from upstream openshift-kni/baremetal-deploy

fix to disable hostname from lab dhcp server and public interface during first boot - slack thread

Fixes # (issue)

Added a new nmstate config for day-1 installation - link

Added a networkData secret in baremetalhost resource in day-2 scaling playbook, config

Please select the appropriate options:

Testing

Test Configuration:

Checklist

mukrishn commented 5 months ago

Tested on R650, had some lab issue but able to deploy all control plane nodes successfully.

$ oc get nodes
NAME                                        STATUS   ROLES                         AGE     VERSION
f04-h09-000-r640.rdu2.scalelab.redhat.com   Ready    control-plane,master,worker   7h18m   v1.27.10+28ed2d7
f04-h10-000-r640.rdu2.scalelab.redhat.com   Ready    control-plane,master,worker   7h18m   v1.27.10+28ed2d7
f04-h11-000-r640.rdu2.scalelab.redhat.com   Ready    control-plane,master,worker   7h18m   v1.27.10+28ed2d7
mukrishn commented 5 months ago

tested deployment with updated config

$ oc get nodes 
NAME                                        STATUS   ROLES                         AGE     VERSION
master-0                                    Ready    control-plane,master,worker   14h     v1.27.10+28ed2d7
master-1                                    Ready    control-plane,master,worker   14h     v1.27.10+28ed2d7
master-2                                    Ready    control-plane,master,worker   14h     v1.27.10+28ed2d7

scale worker is partially tested, need a lab allocation to test it thoroughly.

mukrishn commented 5 months ago

tested this on FC640s, thanks @wilsondav for the lab env.

$ oc get nodes
NAME              STATUS   ROLES                  AGE     VERSION
master-0          Ready    control-plane,master   47m     v1.27.10+28ed2d7
master-1          Ready    control-plane,master   48m     v1.27.10+28ed2d7
master-2          Ready    control-plane,master   47m     v1.27.10+28ed2d7
worker000-fc640   Ready    worker                 11m     v1.27.10+28ed2d7
worker001-fc640   Ready    worker                 11m     v1.27.10+28ed2d7
worker002-fc640   Ready    worker                 11m     v1.27.10+28ed2d7
worker003-fc640   Ready    worker                 8m38s   v1.27.10+28ed2d7
mukrishn commented 5 months ago

@josecastillolema @wilsondav please review

josecastillolema commented 5 months ago

Thanks @mukrishn , will validate the PR in the small VCP env.

josecastillolema commented 4 months ago

@wilsondav can you please paste here the errors you had with this PR in cloud18 and cloud26?

josecastillolema commented 4 months ago

Regarding the fixed hostname issue, it looks like fresh installs works fine but scale ups lack the fix, i.e.:

e23-h24-b03-fc640.rdu2.scalelab.redhat.com   Ready    worker                 3h41m   v1.27.11+749fe1d
e23-h24-b04-fc640.rdu2.scalelab.redhat.com   Ready    worker                 3h40m   v1.27.11+749fe1d
master-0                                     Ready    control-plane,master   5h53m   v1.27.11+749fe1d
master-1                                     Ready    control-plane,master   5h53m   v1.27.11+749fe1d
master-2                                     Ready    control-plane,master   5h52m   v1.27.11+749fe1d
worker000-fc640                              Ready    worker                 5h16m   v1.27.11+749fe1d
worker001-fc640                              Ready    worker                 5h16m   v1.27.11+749fe1d

Could the PR be split into two? One for the upstream rebase and another one for the fixed hostname issue?

Thanks

mukrishn commented 4 months ago

@josecastillolema PR #307 is rebase