[X] Define a process for monitoring and notification of resource out of policy resources. What will we look at? How often? will it alarm? who will monitor it? What will we do to respond to resource over utilization?
This is not a one and done process. And will not be close to finished at 1.0 For the latest updates search on the following labels: logging-and-monitoring, metrics, ResourcesQuatasBilling
Example:
Show a count of jobs in DataDog hmdc/sid#792
Auto-scaling
[ ] Configure enough total resources allocated to allow the environment to automatically grow to support up to 100 jobs (1CPU/4GB each)
The PR https://github.com/hmdc/sid-tf/pull/26 which set up the ASG creates a max of 7 m5a.xlarge nodes which each can fit 3 jobs, plus 3 m5a.large nodes which each can fit 1 job, for a total of 24 jobs
The PR https://github.com/hmdc/sid-tf/pull/28 will amend this to a max of 33 m5a.xlarge nodes which each can fit 3 jobs, plus 3 m5a.large nodes which each can fit 1 job, for a total of 102 jobs
AAA
[x] Authenticaion/authorization using harvard key/Grouper.
End User Support
[ ] Get guidance and buy-in from Soner on this section
[ ] Define a New User Onboarding process (See hmdc/sid#505 for a start )
[ ] Define a User lifecycle process
[ ] Define how we are going to interact and gather usability and feedback information from early adopters.
The first thought here is to use slack Sid-users. Phil mentioned that dataverse has a way that they do this. The Sid team does not
pre-requisites, gotchas, how-tos, FAQs
[ ] User suport process
We are going to have to capture many things for documentation around use of the environment.
Story Description
This ticket documents the definition of done for our first production environment.
Helpful Context, Background
We are going to be opening up Sid for the first time in production on Oct 30th 2019.
How do we test this feature so that we know that it is done?
Turn on the Production Environment
End User Documentation Hosting
End User Documentation
Resource Enforcement
Auto-scaling
m5a.xlarge
nodes which each can fit 3 jobs, plus 3m5a.large
nodes which each can fit 1 job, for a total of 24 jobsm5a.xlarge
nodes which each can fit 3 jobs, plus 3m5a.large
nodes which each can fit 1 job, for a total of 102 jobsAAA
End User Support
For a follow-on release
Related To: