hmdc / PlanningBoardExperiments

This repository has no code
0 stars 0 forks source link

Charge Back Support #39

Open mreekie opened 4 years ago

mreekie commented 4 years ago

Is feature related to a problem?

There is currently no way for us to measure let alone enforce resource usage.

Describe the Proposed Feature

This topic is expansive. In the context of Sid, Chargeback is available only after the collection of several features. The steps work like this

Set a quota for X hours of wall clock time on the system for a given group. The group could be a project or sponsor or organization. The group members can each individually draw down on this total. This quota exists as a number in an administrative database. Quotas ares sometimes also be set against memory (though I’m not clear on how this was allocated or measured), and disk usage.

Monitor system resources.

In addition to monitoring things that would impact the quotas everyone also made use of and reported an efficiency metric. Exactly how they defined this seemed to vary and I didn’t get deep into a discussion with anyone on it. The efficiency metric is used to give feedback back to a person and a team addressing the issue of: “You asked for X CPUs and Y memory but you only used 1 CPU because your code doesn’t make use of the others, and you only used X% of the memory you reserved. Or: “You reserved 10 hours on the system but your project only ran for 1 hour.”

Account for System resources by user and group. At this point you’ve set your quotas and you are monitoring system resources. Now you can compare the quotas you set against the data you’ve collected.

Bill for usage.

Describe alternatives or features you've seen elsewhere

Some schools are using the NSIF funded Cold Front from the University of Buffalo for managing user account quotas.

Some schools are using XDMod from the University of Buffalo as the central way to view system resources. As with other system monitoring and reporting systems, the usefulness of the tool is only as good as the data you populate into its database. XDMod has integrations with SLURM that make this process easier.

Links to Other products with similar feature

Helpful Context, Background

any_thing_that_makes_it_easier_to_understand_goes_here

List any other Features this is potentially related to

List any other Features this is potentially related to

mreekie commented 4 years ago

started populating at the SC19

mreekie commented 4 years ago

For the teams we met with at SC19 Chargeback support for traditional HPC follows a common pattern. There are users who belong to groups. A group might represent a resarch team or a department or other organizational unit. Limits are set at the group level and individual authorizations draw down those group level authorizations

User and group authorizations are setup.

An example of how these authorizations was demoed with the University of Buffalo's Cloud

image