nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
1 stars 0 forks source link

Gather 5-6 Nodes to use for our Logging Metrics & Observability Cluster #176

Closed joachimweyl closed 11 months ago

joachimweyl commented 1 year ago
joachimweyl commented 1 year ago

@larsks please use this issue as the way to track requesting 5 of the worker nodes from @aabaris. @computate This is where we are tracking getting the nodes needed for Observability Logging & metrics.

joachimweyl commented 1 year ago

As these are the workers already set up for OpenShift step 2 is also done.

joachimweyl commented 1 year ago

@computate Once these 5 nodes are passed to you please create a new issue to cover the OpenShift install and pull in who you need to help you with the install.

computate commented 12 months ago

@aabaris @larsks I understood that we would use 3 PowerEdge FC430s and 2 FC830s, but I'll let you two figure out what we will use.

joachimweyl commented 12 months ago

@computate the initial plan is to use 5 FC430s and add more workers when needed.

joachimweyl commented 12 months ago

@aabaris Do we have a list of 5 of the worker nodes waiting for OpenShift that are working and can be used for spinning up a new Observability cluster?

aabaris commented 12 months ago

@aabaris Do we have a list of 5 of the worker nodes waiting for OpenShift that are working and can be used for spinning up a new Observability cluster?

@joachimweyl I should be able to find 5 nodes for this purpose sometime this week. In order to make them available, I will need to know which network they will be deployed in. (Nodes don't have to be on prod network to access the prod cluster).

aabaris commented 12 months ago

@aabaris Do we have a list of 5 of the worker nodes waiting for OpenShift that are working and can be used for spinning up a new Observability cluster?

@joachimweyl I should be able to find 5 nodes for this purpose sometime this week. In order to make them available, I will need to know which network they will be deployed in. (Nodes don't have to be on prod network to access the prod cluster).

@joachimweyl if we don't have an answer for the choice of networks, I think we should consult with @jtriley

joachimweyl commented 12 months ago

@larsks & @computate do we know what network we want these on?

larsks commented 11 months ago

@aabaris I don't think we have a particular preference for the network. In my ideal world, we would be deploying each cluster on a dedicated private network, but I understand that the way things stand right now that's not necessarily possible.

Since this cluster is going to be configured in a similar fashion to the production cluster (e.g., including public access to the console and API), I would default to having it live on the same network unless @jtriley has an alternative preference.

aabaris commented 11 months ago

I attempted to allocate nodes from 3 separate chassis to provide some failure domain resilience, however I ran into numerous hardware problems (to be addressed in separate issues). I propose that we use cmc-8, the one FX chassis that we don't anticipate to be disrupted by repair work.

I confirmed ability to netboot nodes:

wrk-64 wrk-65 wrk-66 wrk-67 wrk-68 wrk-69

I need some additional information to finalize the PR for DHCP config.

@jtriley how should we assign with VLANs and DNS for these hosts?

@larsks:
1) how many public IPs will be needed? 2) What version of openshift will be installed?
3) Do you have a discovery ISO that I can load into our boot server?

aabaris commented 11 months ago

Next steps to getting this done:

jtriley commented 11 months ago

@aabaris I put in a request with networking to get these flipped. Will let you know when it's ready for testing today/tomorrow. I can also take care of the DNS configuration.

aabaris commented 11 months ago

Thank you @jtriley

I tested the nodes and can confirm that the network changes for both admin and storage vlans were successful.

Are we also going to update their OBM DNS entries? Here's a list of current OBM IPs

ctl-0-obm 10.30.0.83 (currently wrk-64-obm.nerc-ocp-prod) ctl-1-obm 10.30.0.84 (currently wrk-65-obm.nerc-ocp-prod) ctl-2-obm 10.30.0.85 (currently wrk-66-obm.nerc-ocp-prod) wrk-0-obm 10.30.0.86 (currently wrk-67-obm.nerc-ocp-prod) wrk-1-obm 10.30.0.87 (currently wrk-68-obm.nerc-ocp-prod) wrk-2-obm 10.30.0.88 (currently wrk-69-obm.nerc-ocp-prod)

jtriley commented 11 months ago

Yes I'll update DNS using those names with subdomain/cluster name nerc-ocp-obs.

jtriley commented 11 months ago

@aabaris DNS has been updated. I ended up using the following IPs for the cluster:

api.nerc-ocp-obs          10.30.9.15
api-int.nerc-ocp-obs      10.30.9.15
lb.nerc-ocp-obs           10.30.9.16
*.apps.nerc-ocp-obs       10.30.9.16
ctl-0.nerc-ocp-obs        10.30.9.17
ctl-1.nerc-ocp-obs        10.30.9.18
ctl-2.nerc-ocp-obs        10.30.9.19
wrk-0.nerc-ocp-obs        10.30.9.20
wrk-1.nerc-ocp-obs        10.30.9.21
wrk-2.nerc-ocp-obs        10.30.9.22
ctl-0-jumbo.nerc-ocp-obs  10.30.13.17
ctl-1-jumbo.nerc-ocp-obs  10.30.13.18
ctl-2-jumbo.nerc-ocp-obs  10.30.13.19
wrk-0-jumbo.nerc-ocp-obs  10.30.13.20
wrk-1-jumbo.nerc-ocp-obs  10.30.13.21
wrk-2-jumbo.nerc-ocp-obs  10.30.13.22
ctl-0-obm.nerc-ocp-obs    10.30.0.83
ctl-1-obm.nerc-ocp-obs    10.30.0.84
ctl-2-obm.nerc-ocp-obs    10.30.0.85
wrk-0-obm.nerc-ocp-obs    10.30.0.86
wrk-1-obm.nerc-ocp-obs    10.30.0.87
wrk-2-obm.nerc-ocp-obs    10.30.0.88
larsks commented 11 months ago

@jtriley In order to start the openshift install, I need to know the externally visible domain names for this cluster. For nerc-ocp-prod we're using the base domain shift.nerc.mghpcc.org.

How about obs.nerc.mghpcc.org?

jtriley commented 11 months ago

@larsks That sounds good to me, however, that DNS record doesn't currently exist. Once I meet with network ops folks and get this setup we'll add the DNS records.

larsks commented 11 months ago

I don't think the record needs to exist for the purposes of running the install.

larsks commented 11 months ago

@jtriley it looks like there are some discrepancies between the ip addresses you have listed and what is actually configured. After booting the nodes, this is what I have:

Host MAC IP BMC IP Notes
ctl-0 a8:99:69:4b:9e:41 10.30.9.20 10.30.0.83
ctl-1 a8:99:69:4b:9f:2f 10.30.9.21 10.30.0.84 Identifies as wrk-65
ctl-2 a8:99:69:4b:9e:a9 10.30.9.22 10.30.0.85
wrk-0 a8:99:69:4b:a4:e7 10.30.9.23 10.30.0.86
wrk-1 a8:99:69:4b:9e:5b 10.30.9.24 10.30.0.87
wrk-2 a8:99:69:4b:a0:9d 10.30.9.25 10.30.0.88

I'm concerned about the address mismatches. Let me know if it's safe to proceed with an install.

jtriley commented 11 months ago

@larsks I just updated DHCP to match what was allocated in DNS. Should be all set now.