Closed joachimweyl closed 11 months ago
@larsks please use this issue as the way to track requesting 5 of the worker nodes from @aabaris. @computate This is where we are tracking getting the nodes needed for Observability Logging & metrics.
As these are the workers already set up for OpenShift step 2 is also done.
@computate Once these 5 nodes are passed to you please create a new issue to cover the OpenShift install and pull in who you need to help you with the install.
@aabaris @larsks I understood that we would use 3 PowerEdge FC430s and 2 FC830s, but I'll let you two figure out what we will use.
@computate the initial plan is to use 5 FC430s and add more workers when needed.
@aabaris Do we have a list of 5 of the worker nodes waiting for OpenShift that are working and can be used for spinning up a new Observability cluster?
@aabaris Do we have a list of 5 of the worker nodes waiting for OpenShift that are working and can be used for spinning up a new Observability cluster?
@joachimweyl I should be able to find 5 nodes for this purpose sometime this week. In order to make them available, I will need to know which network they will be deployed in. (Nodes don't have to be on prod network to access the prod cluster).
@aabaris Do we have a list of 5 of the worker nodes waiting for OpenShift that are working and can be used for spinning up a new Observability cluster?
@joachimweyl I should be able to find 5 nodes for this purpose sometime this week. In order to make them available, I will need to know which network they will be deployed in. (Nodes don't have to be on prod network to access the prod cluster).
@joachimweyl if we don't have an answer for the choice of networks, I think we should consult with @jtriley
@larsks & @computate do we know what network we want these on?
@aabaris I don't think we have a particular preference for the network. In my ideal world, we would be deploying each cluster on a dedicated private network, but I understand that the way things stand right now that's not necessarily possible.
Since this cluster is going to be configured in a similar fashion to the production cluster (e.g., including public access to the console and API), I would default to having it live on the same network unless @jtriley has an alternative preference.
I attempted to allocate nodes from 3 separate chassis to provide some failure domain resilience, however I ran into numerous hardware problems (to be addressed in separate issues). I propose that we use cmc-8, the one FX chassis that we don't anticipate to be disrupted by repair work.
I confirmed ability to netboot nodes:
wrk-64 wrk-65 wrk-66 wrk-67 wrk-68 wrk-69
I need some additional information to finalize the PR for DHCP config.
@jtriley how should we assign with VLANs and DNS for these hosts?
@larsks:
1) how many public IPs will be needed?
2) What version of openshift will be installed?
3) Do you have a discovery ISO that I can load into our boot server?
Next steps to getting this done:
[ ] We need to move wrk-64 through wrk-69 to nerc-ocp-infra vlans: admin = 2176, storage = 2177 I believe this is a request for either Christian or Nick. @jtriley could you assist or give me some guidance how to engage those guys for help. I don't know what ports these are in, but here's a list of mac addresses.
wrk-64 A8:99:69:4B:9E:41 A8:99:69:4B:9E:44
wrk-65 A8:99:69:4B:9F:2F A8:99:69:4B:9F:32
wrk-66 A8:99:69:4B:9E:A9 A8:99:69:4B:9E:AC
wrk-67 A8:99:69:4B:A4:E7 A8:99:69:4B:A4:EA
wrk-68 A8:99:69:4B:9E:5B A8:99:69:4B:9E:5E
wrk-69 A8:99:69:4B:A0:9D A8:99:69:4B:A0:A0
[ ] We also need to update their IP and DNS. I could propose the allocation bellow, but perhaps @larsks has some preferences or requirements?
ctl-0.nerc-ocp-observability.rc.fas.harvard.edu 10.30.9.20
ctl-1.nerc-ocp-observability.rc.fas.harvard.edu 10.30.9.21
ctl-2.nerc-ocp-observability.rc.fas.harvard.edu 10.30.9.22
wrk-0.nerc-ocp-observability.rc.fas.harvard.edu 10.30.9.23
wrk-1.nerc-ocp-observability.rc.fas.harvard.edu 10.30.9.24
wrk-2.nerc-ocp-observability.rc.fas.harvard.edu 10.30.9.25
@aabaris I put in a request with networking to get these flipped. Will let you know when it's ready for testing today/tomorrow. I can also take care of the DNS configuration.
Thank you @jtriley
I tested the nodes and can confirm that the network changes for both admin and storage vlans were successful.
Are we also going to update their OBM DNS entries? Here's a list of current OBM IPs
ctl-0-obm 10.30.0.83 (currently wrk-64-obm.nerc-ocp-prod) ctl-1-obm 10.30.0.84 (currently wrk-65-obm.nerc-ocp-prod) ctl-2-obm 10.30.0.85 (currently wrk-66-obm.nerc-ocp-prod) wrk-0-obm 10.30.0.86 (currently wrk-67-obm.nerc-ocp-prod) wrk-1-obm 10.30.0.87 (currently wrk-68-obm.nerc-ocp-prod) wrk-2-obm 10.30.0.88 (currently wrk-69-obm.nerc-ocp-prod)
Yes I'll update DNS using those names with subdomain/cluster name nerc-ocp-obs
.
@aabaris DNS has been updated. I ended up using the following IPs for the cluster:
api.nerc-ocp-obs 10.30.9.15
api-int.nerc-ocp-obs 10.30.9.15
lb.nerc-ocp-obs 10.30.9.16
*.apps.nerc-ocp-obs 10.30.9.16
ctl-0.nerc-ocp-obs 10.30.9.17
ctl-1.nerc-ocp-obs 10.30.9.18
ctl-2.nerc-ocp-obs 10.30.9.19
wrk-0.nerc-ocp-obs 10.30.9.20
wrk-1.nerc-ocp-obs 10.30.9.21
wrk-2.nerc-ocp-obs 10.30.9.22
ctl-0-jumbo.nerc-ocp-obs 10.30.13.17
ctl-1-jumbo.nerc-ocp-obs 10.30.13.18
ctl-2-jumbo.nerc-ocp-obs 10.30.13.19
wrk-0-jumbo.nerc-ocp-obs 10.30.13.20
wrk-1-jumbo.nerc-ocp-obs 10.30.13.21
wrk-2-jumbo.nerc-ocp-obs 10.30.13.22
ctl-0-obm.nerc-ocp-obs 10.30.0.83
ctl-1-obm.nerc-ocp-obs 10.30.0.84
ctl-2-obm.nerc-ocp-obs 10.30.0.85
wrk-0-obm.nerc-ocp-obs 10.30.0.86
wrk-1-obm.nerc-ocp-obs 10.30.0.87
wrk-2-obm.nerc-ocp-obs 10.30.0.88
@jtriley In order to start the openshift install, I need to know the externally visible domain names for this cluster. For nerc-ocp-prod
we're using the base domain shift.nerc.mghpcc.org
.
How about obs.nerc.mghpcc.org
?
@larsks That sounds good to me, however, that DNS record doesn't currently exist. Once I meet with network ops folks and get this setup we'll add the DNS records.
I don't think the record needs to exist for the purposes of running the install.
@jtriley it looks like there are some discrepancies between the ip addresses you have listed and what is actually configured. After booting the nodes, this is what I have:
Host | MAC | IP | BMC IP | Notes |
---|---|---|---|---|
ctl-0 | a8:99:69:4b:9e:41 | 10.30.9.20 | 10.30.0.83 | |
ctl-1 | a8:99:69:4b:9f:2f | 10.30.9.21 | 10.30.0.84 | Identifies as wrk-65 |
ctl-2 | a8:99:69:4b:9e:a9 | 10.30.9.22 | 10.30.0.85 | |
wrk-0 | a8:99:69:4b:a4:e7 | 10.30.9.23 | 10.30.0.86 | |
wrk-1 | a8:99:69:4b:9e:5b | 10.30.9.24 | 10.30.0.87 | |
wrk-2 | a8:99:69:4b:a0:9d | 10.30.9.25 | 10.30.0.88 |
I'm concerned about the address mismatches. Let me know if it's safe to proceed with an install.
@larsks I just updated DHCP to match what was allocated in DNS. Should be all set now.