uabrc / devops-docs

https://docs.rc.uab.edu/devops-docs/
Apache License 2.0
1 stars 7 forks source link

How to: triage shared space requests. #29

Open wwarriner opened 1 year ago

wwarriner commented 1 year ago

User story

As an ops person, SNow shared space requests can be tedious to navigate, and I have to learn all of the information already gathered by Facilitation, duplicating effort. A summary of the shared space execution plan would decrease turnaround time.

Proposed solution

For a shared space request ticket, accumulate necessary changes as part of an execution plan in the "work notes" field. Each time the requester makes a statement like "add this person" or "give it this name", create a new work note which incorporates the proposed changes. The work note should be like a "running total" of the execution plan so far.

Then, when the conversation is finalized, create a TASK item on the RITM and send the URL of the TASK to @rc-ops in Slack in the #tickets channel.

Example

We can add some redacted screenshots to demonstrate what this looks like in practice.

Procedure for TASK creation and forwarding to ops team

  1. Gather info that ops will need
    • Name of space
    • Path of space (for /data/project/ only, makes it clear this isnt a Sloss space, which we should not be creating more of)
    • Blazerid of Owner (generally the PI of the requester, but you may have to ask some questions to figure this out)
    • Blazerids of everyone with access to the top-level directory. Please include the owner. Mike has a command where he can copy-paste a list of blazerids and quickly see who does/does not have an account.
  2. Create the TASK in ServiceNow
  3. Fill in the short description and full description. "Assignment Group" is "Research Computing".
  4. Post the TASK to #tickets, and follow up like usual
  5. Once the TASK is complete, continue the conversation in the RITM as needed (sharing relevant docs, office hours links, handling questions and concerns, etc.)
wwarriner commented 1 year ago

We will also want to capture metadata about the requester's group and intended use case when the request is made, so we can attach that information to the shared space.

  1. Some statement from the requester about their intended use case.
  2. What type of entity owns the space. Is it a Research Lab Group, or a Core Facility, or a University Admin Group (e.g., UAB IT IDM, UAB Financial Affairs, etc.)? Or something new?
    • If it is a Core Facility, get the parent organization as well. For example, the U-BDS (UAB Biological Data Science Core) lives under the UAB IRCP (Institutional Research Core Program).
    • If it is an Admin Group, get the parent department/division. For example IDM lives under UAB IT.
wwarriner commented 1 year ago

For more information on the Researcher-Facing aspect of this see https://github.com/uabrc/uabrc.github.io/issues/561

wwarriner commented 1 year ago

Boilerplate for what to say after a user's group membership changes:

Group membership is not dynamic, and is picked up by the shell when you login. Running sessions will show errors when trying to access the shared storage. This includes terminals, jobs, and the Open OnDemand web portal's File Browser (https://rc.uab.edu).

To pick up the change, log out and log back in to each session. The OOD web portal can be reset by clicking "Help" in the top right, then clicking "restart web server".

Older information

""" $name will need to log out of any open terminals to pick up the change in group membership and have their new permissions work as expected. If you have recently used our Open OnDemand (OOD) web portal at https://rc.uab.edu, please visit the portal, click "Help" at the right side of the top navigation pane, and then click "Restart Web Server" """

If we must remove someone from a group, we should consider forcibly terminating any open terminals to revoke their access immediately, however this might be done. And on every login* node (to catch OOD on login005).

Edit from conversation between JP and me. Could be useful for folks with interactive jobs they don't want to close, but need the group membership for. This may modify .bashrc so consider carefully.

Something to know is that technically you can use the newgrp command to activate a new group without logging out/in, at least within an existing shell. This command spawns a subshell with the new group active. This is a way to change the group id of the associated process. This can be handy for users that have to switch between their own default group and groups associated with other labs in which they work. This may also be useful if combined with allocations and slurm.

Oh that's cool! I think that may not work with the OOD filebrowser, but that's a good solution for, e.g., an interactive job someone doesn't want to close just yet.

Boilerplate for replying to tickets:

I can see on our system you have been added to the group.

To fix what you are seeing, please login to https://rc.uab.edu. In the top, green navigation bar, look for the "Help" drop down menu. Click that, then click "Restart Web Server". Once that has completed, please try again.

To explain, our OOD web server runs on our login005 node. Whenever a researcher opens the File Browser, Terminal, Job Composer, or any Interactive Apps on OOD, it starts an nginx process to allow the web interface you see to communicate with the login005 node, under your username. This process stays open until the OOD server is restarted on our end, or you manually restart the web server using the process I described above.

wwarriner commented 1 year ago

When putting shared space creation/changes in front of ops, create a new TASK in ServiceNow under the RITM (button near the bottom of the page). Put all and only the necessary details ops needs to make it happen.