nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
1 stars 0 forks source link

Request NESE storage for Hypershift cluster #486

Closed larsks closed 5 months ago

larsks commented 6 months ago

The hypershift environment was originally provisioned with local storage using the longhorn (https://longhorn.io/) storage operator. This isn't appropriate for production use (primarily because the nodes do not have sufficient local disk space), so we will need to request a NESE storage allocation for the hypershift cluster.

joachimweyl commented 6 months ago

@larsks do we have an estimate from NESE of when they will have this ready for us?

larsks commented 6 months ago

I have opened request 002719 with the NESE helpdesk.

joachimweyl commented 6 months ago

@msdisme can you provide the contact information for NESE so I can help with this?

joachimweyl commented 5 months ago

@larsks it sounds like the storage is now available, can you confirm?

larsks commented 5 months ago

@joachimweyl I'm the one who opened the help ticket with NESE, and as of right now there hasn't been any reply to that request posted back on 3/22.

joachimweyl commented 5 months ago

@waygil my understanding was that Milan has provided the data, is there a next step we need him to do?

waygil commented 5 months ago

I assume he will close the ticket once the request has been fulfilled. @larsks is there a ticket for this request?

Wayne

joachimweyl commented 5 months ago

Yes, the ticket is 002719.

joachimweyl commented 5 months ago

@Milstein do we have access to the MGHPCC OSticket?

Milstein commented 5 months ago

I have access to this ticket and assigned to @Milan

waygil commented 5 months ago

@lars Milan replied to you today on this request. A pool called nerc-ocp-obs already exists (along with nerc_ocp_test and nerc_ocp_prod) @jtriley mentioned in today's meeting that this may be a typo. Can you confirm that this should be an additional pool and, if so, what we want to call it?

naved001 commented 5 months ago

@larsks ^^

larsks commented 5 months ago

@naved001 thanks for the name fix

@waygil we handled that via email last week.

hpdempsey commented 5 months ago

@waygil still no storage allocated as of this morning. Please escalate. In addition to blocking the Hypershift work, this is now also blocking the new OpenShift AI versions work and the work to bring over InstructLab.

waygil commented 5 months ago

@larsks could you forward me the latest correspondence you mentioned last week so I can follow up and see where this is at?

waygil commented 5 months ago

Last email that Lars received:

Sure. On 4/15, I received the following email:

Date: Tue, 16 Apr 2024 03:02:10 +0000 Message-ID: [B9KRjuy-nAosO-AAAAAGxSAADzHwAATUQhd7bx-help@nese.mghpcc.org](mailto:B9KRjuy-nAosO-AAAAAGxSAADzHwAATUQhd7bx-help@nese.mghpcc.org) From: NESE Helpdesk [help@nese.mghpcc.org](mailto:help@nese.mghpcc.org) Subject: Re: 10TB volume for nerc-ocp-obs cluster [#002719]

Dear Lars,

Could you please confirm that you want a triple replicated pool named:

nerc_ocp_hypershift_1_rbd

with three access keys:

healthchecker-nerc-ocp-hypershift-1-rbd node-nerc-ocp-hypershift-1-rbd provisioner-nerc-ocp-hypershift-1-rbd

the pool will be accessed from following networks:

10.30.13.0/24

I replied with an update on the network range, and Justin replied with a correction (on 4/17):

From: "Riley, Justin" [justinriley@g.harvard.edu](mailto:justinriley@g.harvard.edu) Date: Wed, 17 Apr 2024 13:44:54 -0400 Message-ID: [CALcGvj1Cj-=xCmWE3iTAhB3sE3HjnqFKFm80mecRBHSsYLUS+g@mail.gmail.com](mailto:CALcGvj1Cj-=xCmWE3iTAhB3sE3HjnqFKFm80mecRBHSsYLUS+g@mail.gmail.com) Subject: Re: 10TB volume for nerc-ocp-obs cluster [#002719]

Hey Folks,

This cluster will be accessing NESE via the NERC test storage VLAN 2175:

10.30.12.0/24

Milan - please use that for ACL instead ^

That's the last I heard on this ticket.

-- Lars Kellogg-Stedman [lars@redhat.com](mailto:lars@redhat.com)

waygil commented 5 months ago

@larsks

Justin just got a ping from Milan that this is ready. Can you confirm?

joachimweyl commented 5 months ago

@jtriley responded in OSTicket "Hey Folks, This cluster will be accessing NESE via the NERC test storage VLAN 2175: 10.30.12.0/24 Milan - please use that for ACL instead ^"

waygil commented 5 months ago

@jtriley just let me know he needs to get credentials in the vault still.

larsks commented 5 months ago

@waygil @jtriley I haven't seen any action on the ticket yet, nor have I received any credentials myself.

jtriley commented 5 months ago

@larsks the credentials are in vault: https://vault-ui-vault.apps.nerc-ocp-infra.rc.fas.harvard.edu/ui/vault/secrets/nerc/kv/list/hypershift1/openshift-storage

Closing this issue as we should be all set now.