Gather 5-6 Nodes to use for our Logging Metrics & Observability Cluster

joachimweyl commented 1 year ago

[x] select 5 or 6 machines from the machines that we are spinning up from the donations that can be networked to access the prod cluster
[x] Setup nodes so that they are networked and ready to have OpenShift installed
[ ] Pass off the OpenShift install

joachimweyl commented 1 year ago

@larsks please use this issue as the way to track requesting 5 of the worker nodes from @aabaris. @computate This is where we are tracking getting the nodes needed for Observability Logging & metrics.

joachimweyl commented 1 year ago

As these are the workers already set up for OpenShift step 2 is also done.

joachimweyl commented 1 year ago

@computate Once these 5 nodes are passed to you please create a new issue to cover the OpenShift install and pull in who you need to help you with the install.

computate commented 12 months ago

@aabaris @larsks I understood that we would use 3 PowerEdge FC430s and 2 FC830s, but I'll let you two figure out what we will use.

joachimweyl commented 12 months ago

@computate the initial plan is to use 5 FC430s and add more workers when needed.

joachimweyl commented 12 months ago

@aabaris Do we have a list of 5 of the worker nodes waiting for OpenShift that are working and can be used for spinning up a new Observability cluster?

aabaris commented 12 months ago

@aabaris Do we have a list of 5 of the worker nodes waiting for OpenShift that are working and can be used for spinning up a new Observability cluster?

@joachimweyl I should be able to find 5 nodes for this purpose sometime this week. In order to make them available, I will need to know which network they will be deployed in. (Nodes don't have to be on prod network to access the prod cluster).

aabaris commented 12 months ago

@aabaris Do we have a list of 5 of the worker nodes waiting for OpenShift that are working and can be used for spinning up a new Observability cluster?

@joachimweyl I should be able to find 5 nodes for this purpose sometime this week. In order to make them available, I will need to know which network they will be deployed in. (Nodes don't have to be on prod network to access the prod cluster).

@joachimweyl if we don't have an answer for the choice of networks, I think we should consult with @jtriley

joachimweyl commented 12 months ago

@larsks & @computate do we know what network we want these on?

larsks commented 11 months ago

@aabaris I don't think we have a particular preference for the network. In my ideal world, we would be deploying each cluster on a dedicated private network, but I understand that the way things stand right now that's not necessarily possible.

Since this cluster is going to be configured in a similar fashion to the production cluster (e.g., including public access to the console and API), I would default to having it live on the same network unless @jtriley has an alternative preference.

aabaris commented 11 months ago

I attempted to allocate nodes from 3 separate chassis to provide some failure domain resilience, however I ran into numerous hardware problems (to be addressed in separate issues). I propose that we use cmc-8, the one FX chassis that we don't anticipate to be disrupted by repair work.

I confirmed ability to netboot nodes:

wrk-64 wrk-65 wrk-66 wrk-67 wrk-68 wrk-69

I need some additional information to finalize the PR for DHCP config.

@jtriley how should we assign with VLANs and DNS for these hosts?

@larsks:
1) how many public IPs will be needed? 2) What version of openshift will be installed?
3) Do you have a discovery ISO that I can load into our boot server?

aabaris commented 11 months ago

Next steps to getting this done:

[ ] We need to move wrk-64 through wrk-69 to nerc-ocp-infra vlans: admin = 2176, storage = 2177 I believe this is a request for either Christian or Nick. @jtriley could you assist or give me some guidance how to engage those guys for help. I don't know what ports these are in, but here's a list of mac addresses.
```
wrk-64 A8:99:69:4B:9E:41 A8:99:69:4B:9E:44
wrk-65 A8:99:69:4B:9F:2F A8:99:69:4B:9F:32
wrk-66 A8:99:69:4B:9E:A9 A8:99:69:4B:9E:AC
wrk-67 A8:99:69:4B:A4:E7 A8:99:69:4B:A4:EA
wrk-68 A8:99:69:4B:9E:5B A8:99:69:4B:9E:5E
wrk-69 A8:99:69:4B:A0:9D A8:99:69:4B:A0:A0
```

[ ] We also need to update their IP and DNS. I could propose the allocation bellow, but perhaps @larsks has some preferences or requirements?

ctl-0.nerc-ocp-observability.rc.fas.harvard.edu 10.30.9.20
ctl-1.nerc-ocp-observability.rc.fas.harvard.edu 10.30.9.21
ctl-2.nerc-ocp-observability.rc.fas.harvard.edu 10.30.9.22
wrk-0.nerc-ocp-observability.rc.fas.harvard.edu 10.30.9.23
wrk-1.nerc-ocp-observability.rc.fas.harvard.edu 10.30.9.24
wrk-2.nerc-ocp-observability.rc.fas.harvard.edu 10.30.9.25

jtriley commented 11 months ago

@aabaris I put in a request with networking to get these flipped. Will let you know when it's ready for testing today/tomorrow. I can also take care of the DNS configuration.

aabaris commented 11 months ago

Thank you @jtriley

I tested the nodes and can confirm that the network changes for both admin and storage vlans were successful.

Are we also going to update their OBM DNS entries? Here's a list of current OBM IPs

ctl-0-obm 10.30.0.83 (currently wrk-64-obm.nerc-ocp-prod) ctl-1-obm 10.30.0.84 (currently wrk-65-obm.nerc-ocp-prod) ctl-2-obm 10.30.0.85 (currently wrk-66-obm.nerc-ocp-prod) wrk-0-obm 10.30.0.86 (currently wrk-67-obm.nerc-ocp-prod) wrk-1-obm 10.30.0.87 (currently wrk-68-obm.nerc-ocp-prod) wrk-2-obm 10.30.0.88 (currently wrk-69-obm.nerc-ocp-prod)

jtriley commented 11 months ago

Yes I'll update DNS using those names with subdomain/cluster name nerc-ocp-obs.

jtriley commented 11 months ago

@aabaris DNS has been updated. I ended up using the following IPs for the cluster:

api.nerc-ocp-obs          10.30.9.15
api-int.nerc-ocp-obs      10.30.9.15
lb.nerc-ocp-obs           10.30.9.16
*.apps.nerc-ocp-obs       10.30.9.16
ctl-0.nerc-ocp-obs        10.30.9.17
ctl-1.nerc-ocp-obs        10.30.9.18
ctl-2.nerc-ocp-obs        10.30.9.19
wrk-0.nerc-ocp-obs        10.30.9.20
wrk-1.nerc-ocp-obs        10.30.9.21
wrk-2.nerc-ocp-obs        10.30.9.22
ctl-0-jumbo.nerc-ocp-obs  10.30.13.17
ctl-1-jumbo.nerc-ocp-obs  10.30.13.18
ctl-2-jumbo.nerc-ocp-obs  10.30.13.19
wrk-0-jumbo.nerc-ocp-obs  10.30.13.20
wrk-1-jumbo.nerc-ocp-obs  10.30.13.21
wrk-2-jumbo.nerc-ocp-obs  10.30.13.22
ctl-0-obm.nerc-ocp-obs    10.30.0.83
ctl-1-obm.nerc-ocp-obs    10.30.0.84
ctl-2-obm.nerc-ocp-obs    10.30.0.85
wrk-0-obm.nerc-ocp-obs    10.30.0.86
wrk-1-obm.nerc-ocp-obs    10.30.0.87
wrk-2-obm.nerc-ocp-obs    10.30.0.88

larsks commented 11 months ago

@jtriley In order to start the openshift install, I need to know the externally visible domain names for this cluster. For nerc-ocp-prod we're using the base domain shift.nerc.mghpcc.org.

How about obs.nerc.mghpcc.org?

jtriley commented 11 months ago

@larsks That sounds good to me, however, that DNS record doesn't currently exist. Once I meet with network ops folks and get this setup we'll add the DNS records.

larsks commented 11 months ago

I don't think the record needs to exist for the purposes of running the install.

larsks commented 11 months ago

@jtriley it looks like there are some discrepancies between the ip addresses you have listed and what is actually configured. After booting the nodes, this is what I have:

Host	MAC	IP	BMC IP	Notes
ctl-0	a8:99:69:4b:9e:41	10.30.9.20	10.30.0.83
ctl-1	a8:99:69:4b:9f:2f	10.30.9.21	10.30.0.84	Identifies as wrk-65
ctl-2	a8:99:69:4b:9e:a9	10.30.9.22	10.30.0.85
wrk-0	a8:99:69:4b:a4:e7	10.30.9.23	10.30.0.86
wrk-1	a8:99:69:4b:9e:5b	10.30.9.24	10.30.0.87
wrk-2	a8:99:69:4b:a0:9d	10.30.9.25	10.30.0.88

I'm concerned about the address mismatches. Let me know if it's safe to proceed with an install.

jtriley commented 11 months ago

@larsks I just updated DHCP to match what was allocated in DNS. Should be all set now.

nerc-project / operations

Gather 5-6 Nodes to use for our Logging Metrics & Observability Cluster #176