nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
1 stars 0 forks source link

No link on several nodes #220

Open larsks opened 1 year ago

larsks commented 1 year ago

The follow nodes appear to have no link on either interface:

jtriley commented 1 year ago

I can confirm there's no link seen on the NICs for any of those hosts from CMC/OBM. This is likely an issue with that entire chassis. @aabaris will be on site Monday for the planned maintenance. We will try to take a look at that chassis after we finish maintenance tasks. Worst case I believe we're done with the test cluster hardware for now so perhaps we could use that temporarily for testing hypershift if needed although we'd need to double check with @dystewart .

dystewart commented 1 year ago

At this moment myself and isaiah are doing some work and testing (curator & testing ope notebook being used ROSA env needs testing in NERC). I'm not sure what the requirements are for your needs @larsks but worst case would it make sense to take a few nodes from the test cluster instead of all?

joachimweyl commented 1 year ago

@aabaris Do we have 3 other nodes we can swap for these so we don't have to wait for these to get fixed?

aabaris commented 1 year ago

@aabaris Do we have 3 other nodes we can swap for these so we don't have to wait for these to get fixed?

@joachimweyl,

First of all, I'm at the data center now and there's only one of me. I can try to help out tomorrow and identify some nodes, but @jtriley usually interfaces with HU Network Engineers for VLAN mappings and such.

2nd.. where are these nodes?
according to: https://github.com/OCP-on-NERC/nerc-ansible/blob/e4b5b43c5a4ceb52744494d72bd1f7ef20e6a1d2/inventory/00-static/ocp-prod.yaml

They should be at:

              location:
                row: 1
                pod: A
                cage: 8
                unit: 27

That whole pod has been emptied out.

aabaris commented 1 year ago

I found out from Justin that nodes were moved to 1-C-20. However I don't have access to that pod.

Could we:

  1. Get @larsks unblocked by making 3 nodes available per his request
    https://github.com/nerc-project/operations/issues/233 (I'll work with @jtriley on this tomorrow)

  2. Ask @hakasapl who has access to that pod to investigate cabling situation for the chassis holding these nodes next time he is at MGHPCC and has time.

joachimweyl commented 1 year ago

@aabaris Thank you, next time I will be more clear about the desired turnaround time. I was not intending for you to create a resolution while at the MGHPCC. Thank you so much for responding while at MGHPCC.

joachimweyl commented 1 year ago

@aabaris @hakasapl says that the same key that worked for the 1-A-8 should work in 1-C-20 did you get a chance to try that?

aabaris commented 1 year ago

@joachimweyl I had access to NERC and Harvard keys. Neither of them gave me access to 1-C-20. I did not try those keys on 1-A-8, those cabinets were empty and the pod side doors were missing.

joachimweyl commented 1 year ago

@hakasapl is this something we can have techsquared check for us?

hakasapl commented 1 year ago

Longer term we need to swap the key out in 1-C-20 to NERC instead of UMass, which would require adding the UMass pod key to the NERC keyring as well, although this may be superseded by a potential comabined MOC/NERC key.

I will add this to the list techsquare can look out and reach out after they are done with this