nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
2 stars 0 forks source link

connectivity between BM ESI nodes and NERC infra ingress/api #726

Closed tssala23 closed 1 month ago

tssala23 commented 2 months ago

Motivation

Currently clusters created with ESI nodes are not able to reach the NERC network. This means they are unable to to access vault and be attached to ACM. curl to infra from pod on ocp-beta-cluster:

/ # curl https://api.nerc-ocp-infra.rc.fas.harvard.edu:6443
curl: (6) Could not resolve host: api.nerc-ocp-infra.rc.fas.harvard.edu

/ # curl --connect-timeout 10 https://10.30.9.5:6443 -k
curl: (28) Failed to connect to 10.30.9.5 port 6443 after 10002 ms: Timeout was reached

curl to infra from pod on prod cluster:

/ # curl https://api.nerc-ocp-infra.rc.fas.harvard.edu:6443
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403
}/ # exit

/ # curl https://10.30.9.5:6443 -k
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {},
  "code": 403

Completion Criteria

Connection between BM ESI nodes and NERC

Description

Completion dates

Desired - 20YY-MM-DD Required - TBD

hakasapl commented 2 months ago

ESI nodes can currently attach to NERC networks but we restricted those networks to the nerc project only. We definitely want to keep those networks locked down. Am I misunderstanding something?

larsks commented 2 months ago

@hakasapl We want to be able to manage ESI-hosted clusters using ACM/argocd on nerc-ocp-infra and we want to be able to access the vault (also on nerc-ocp-infra) from these same clusters.

hakasapl commented 2 months ago

It looks like the network you want to attach to on ESI then is named nerc-openshift-infra-frontend, which is 10.30.9.0/24. That network is only available on the nerc project in ESI though.

hakasapl commented 2 months ago

Are you trying to allocate on that network or you just want a route to get there? I'm not sure how the existing clusters are set up

hakasapl commented 2 months ago

The goal is to add a network to ESI which assigns IPs using dhcp. ESI should then also provide static routes to go to the NERC network defined in this issue. A similarly working network is already defined for nese storage. In the meeting there was a preference to self-host this router to reduce reliance on fasrc.

We should also split networks up such that the network created from this issue is used for a specific purpose (ie. I don't want to just route to every nerc network on this network, only the ones that are needed)

So, to do this we need a few bits of info.

  1. Which NERC VLANs or subnets do we need access to. (I have 10.30.9.0/24 so far, anything else?)
    1. @jtriley for each of these networks, what IPs can the MOC side router allocate (one on each)
  2. Are network ACLs a requirement here or can we just pass through everything?
  3. Which projects in ESI should have access to this router
    1. The network will be owned by the nerc-admins project and shared with whatever other project needs access

Everything here is software-level all the hardware/cabling for this to work is already in place. Does this sound reasonable?

hpdempsey commented 2 months ago

We also need the network to allow connections to test clusters for monitoring them from ACM. Some part of the Observability system may also need access to allow us to trigger GPU monitoring software in the clusters. @schwesig can you please provide more detail on any relevant hosts and VLANs or addresses?

hpdempsey commented 2 months ago

Lars also mentioned making sure that DNS was adjusted to include the machines in ESI. Not sure if that needs to be on this issue.

As a lower priority, this router would need to be added to the network statistics you are planning to collect from the IT monitor in order to add them to the Observability metrics. There is another issue already created for this as it applied to existing switches, but that would affect this router/switch too (if it isn't already included). It is fine to deply without this ability initially if needed.

hakasapl commented 2 months ago

We also need the network to allow connections to test clusters for monitoring them from ACM. Some part of the Observability system may also need access to allow us to trigger GPU monitoring software in the clusters. @schwesig can you please provide more detail on any relevant hosts and VLANs or addresses?

In this case the NERC side can use the IP the ESI router allocates on the NERC end as a router to go the other direction, which I think should work.

joachimweyl commented 1 month ago

@hakasapl what are the next steps for this issue?

hakasapl commented 1 month ago

There are open questions in my previous reply that need to be answered before we can do anything. Probably Justin for the NERC side and Taj for the ESI side questions

tssala23 commented 1 month ago

@hakasapl The projects that would need access are the ones hosting clusters ESI project names:

joachimweyl commented 1 month ago

@jtriley please see the questions above.

joachimweyl commented 1 month ago

@hakasapl what are the next steps for this?

hakasapl commented 1 month ago

I believe @jtriley said he would fill in info about what IP I can allocate on the NESE side for the virtual router and also what network ACLs are needed. We'll need this info to continue

jtriley commented 1 month ago

@hakasapl apologies for the delay. The VLAN we need for this is 2176. You can use 10.30.9.100 for the virtual router IP - I'll get that added to DNS today. If possible, it'd be good to restrict access to tcp/80, tcp/443, and tcp/6443 on this net from approved ESI projects. If we can further lock it down to specific IPs on that net, that would be preferred and the IPs needed to access vault and openshift API would be the ingress IP 10.30.9.6 and the API IP 10.30.9.5. The DNS will be tricky given that it's currently locked to Harvard nets. Ideally we reprovision the infra cluster at some point to use the nerc.mghpcc.org domain which is public. I might be able to open that up within FASRC DNS still but will need to look into that.

hakasapl commented 1 month ago

Thanks @jtriley I am planning on working on this on Monday

hakasapl commented 1 month ago

Sorry for the delay, I was out sick Monday and partially Tuesday. I will likely be able to implement this before Friday this week.

hakasapl commented 1 month ago

I've made a PR for this: https://github.com/CCI-MOC/esi-pilot/pull/77

The network side of this is complete including all required ACLs. The next step is figuring out how we will manage this in ESI

hakasapl commented 1 month ago

@tssala23 this network nerc-infra-routed is available in the 3 projects you listed now. When I get the name for Dylan's project I can add that too. When you get a chance could you verify? You'll get a 10.85 IP with a static route that goes to NERC infra

tssala23 commented 1 month ago

@hakasapl yes it shows up in my list of networks now

[tsalawu@tsalawu-thinkpadx1nanogen2 mocesi]$ openstack network list
+--------------------------------------+-------------------+--------------------------------------+
| ID                                   | Name              | Subnets                              |
+--------------------------------------+-------------------+--------------------------------------+
| 0a4d300e-8df6-4ec5-965e-de0e71bd13f1 | nerc-infra-routed | 2394a762-3453-4db3-ac13-681ed52a0e72 |
| 0a8b48e2-d758-4f47-a424-ec57421abad2 | ocp-beta          | f986f6ce-cac5-4c76-89ba-3da4b8111334 |
| 34e429da-d586-4deb-817e-b49a102f1a9b | storage           | 65af7164-7e87-4bdf-8eba-276338b01f9c |
| 71bdf502-a09f-4f5f-aba2-203fe61189dc | external          | e1d5a5ca-947e-4e3e-9f7a-cee4619dc5c4 |
| 953571cb-9e91-4676-a8e4-03d0a4923aa9 | nese-storage      | 43e94fe3-b516-4a89-bb09-cdc516fce812 |
| 96263add-4cbc-485e-91cc-292965783867 | provisioning      | 54a3bce5-654c-4362-b53f-4c137114c317 |
+--------------------------------------+-------------------+--------------------------------------+
hakasapl commented 1 month ago

This is in-place now so I am closing this issue. There is a follow-up item related to this network in this issue: https://github.com/CCI-MOC/ops-issues/issues/1413

dystewart commented 1 month ago

@hakasapl Reopening this till we have networking set up for ESI project "ope"

hakasapl commented 1 month ago

@dystewart I've added that network to the "ope" project - let me know if you see

hakasapl commented 1 month ago

@dystewart are we all set with this?

dystewart commented 1 month ago

@hakasapl realized I'll need this setup for ESI project orran_cloud_computing as well. Once we have that we're all set!

hakasapl commented 1 month ago

@dystewart I've added the network to that project as well.