Open wwarriner opened 1 year ago
Partition | Time Limit in Hours | Nodes (Limit/Partition) | Cores/Node (Limit/Person) | Mem GB/Node (Limit/Person) | GPU/Node (Limit/Person) |
---|---|---|---|---|---|
express | 2.0 | 51 (~) | 48 (264) | 754 (3072) | |
short | 12.0 | 51 (44) | 48 (264) | 754 (3072) | |
medium | 50.0 | 51 (44) | 48 (264) | 754 (3072) | |
long | 150.0 | 51 (5) | 48 (264) | 754 (3072) | |
largemem | 50.0 | 13 (10) | 24 (290) | 755 (7168) | |
largemem-long | 150.0 | 5 (10) | 24 (290) | 755 (7168) | |
pascalnode | 12.0 | 18 (~) | 28 (56) | 252 (500) | 4 (8) |
pascalnodes-medium | 48.0 | 7 (~) | 28 (56) | 252 (500) | 4 (8) |
amperenodes | 12.0 | 20 (TBD) | 32 (64) | 189 (384) | 2 (4) |
amperenodes-medium | 48.0 | 20 (TBD) | 32 (64) | 189 (384) | 2 (4) |
amd-hdr100 | 150.0 | 34 (5) | 128 (264) | 504 (3072) | |
Interactive | |||||
Intel DCB |
CPU GPU Generation | Compute Type | Die Name | Gpu Name | Gb Mem Node | Gpu Mem Gb | Gpu Per Node | Total Nodes | Total Gpus | Total Cores | Total Memory Gb | Cores Per Node | Cores Per Die | Dies Per Node | Die Frequency Ghz |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | cpu: amd | AMD Opteron 242 | 16 | 64 | 128 | 1024 | 2 | 1 | 2 | 1.6 | ||||
10 | cpu: amd | AMD Epyc 7713 Milan | 512 | 34 | 4352 | 17408 | 128 | 64 | 2 | 2 | ||||
1 | cpu: intel | Intel Xeon Gold 6248R | 192 | 5 | 240 | 960 | 48 | 12 | 4 | 3 | ||||
1 | cpu: intel | Intel Xeon Gold 6248R | 192 | 3 | 144 | 576 | 48 | 12 | 4 | 3 | ||||
8 | cpu: intel | Intel Xeon E5-2680 v4 | 192 | 21 | 504 | 4032 | 24 | 12 | 2 | 2.5 | ||||
2 | cpu: intel | Intel Xeon E5450 | 48 | 24 | 192 | 1152 | 8 | 4 | 2 | 3 | ||||
3 | cpu: intel | Intel Xeon X5650 | 48 | 32 | 384 | 1536 | 12 | 6 | 2 | 2.66 | ||||
3 | cpu: intel | Intel Xeon X5650 | 96 | 16 | 192 | 1536 | 12 | 6 | 2 | 2.66 | ||||
4 | cpu: intel | Intel Xeon X5650 | 384 | 3 | 48 | 1152 | 16 | 8 | 2 | 2.7 | ||||
5 | cpu: intel | Intel Xeon E2650 | 96 | 12 | 192 | 1152 | 16 | 8 | 2 | 2 | ||||
6 | cpu: intel | Intel Xeon E5-2680 v3 | 384 | 14 | 336 | 5376 | 24 | 12 | 2 | 2.5 | ||||
6 | cpu: intel | Intel Xeon E5-2680 v3 | 256 | 38 | 912 | 9728 | 24 | 12 | 2 | 2.5 | ||||
6 | cpu: intel | Intel Xeon E5-2680 v3 | 128 | 44 | 1056 | 5632 | 24 | 12 | 2 | 2.5 | ||||
9 | cpu: intel | Intel Xeon Gold 6248R | 768 | 52 | 2496 | 39936 | 48 | 24 | 2 | 3 | ||||
8 | mem: large | Intel Xeon E5-2680 v4 | 768 | 10 | 240 | 7680 | 24 | 12 | 2 | 2.5 | ||||
8 | mem: large | Intel Xeon E5-2680 v4 | 1536 | 4 | 96 | 6144 | 24 | 12 | 2 | 2.5 | ||||
7 | gpu: pascal | Intel Xeon E5-2680 v4 | NVIDIA Tesla P100 | 256 | 16.0 | 4.0 | 18 | 72.0 | 504 | 4608 | 28 | 14 | 2 | 2.4 |
11 | gpu: ampere | AMD Epyc 7742 Rome | NVIDIA A100 | 1024 | 40.0 | 8.0 | 4 | 32.0 | 512 | 4096 | 128 | 64 | 2 | 2.25 |
11 | gpu: ampere | AMD Epyc 7742 Rome | NVIDIA A100 | 1024 | 40.0 | 8.0 | 4 | 32.0 | 512 | 4096 | 128 | 64 | 2 | 2.25 |
11 | gpu: ampere | AMD Epyc 7763 Milan | NVIDIA A100 | 512 | 80.0 | 2.0 | 20 | 40.0 | 2560 | 10240 | 128 | 64 | 2 | 2.45 |
Sorry I didn't see this ticket before I did this but I updated the tables to make them more readable. Putting the relevant information closer to the left and combining some columns. I also wrote some copy for the Jupyter Lab page to help people fill out the form.
Launching a JupyterLab server in a High-Performance Computing (HPC) environment. Here's how to fill out each section:
Environment Setup: This section allows you to specify any additional software modules or specific versions of Anaconda that you'd like to load. If you need a particular Python package or software tool, you can list it here. Use the format module load example_module/VERSION
.
Extra JupyterLab arguments: If you have any additional command-line arguments to pass to JupyterLab, you can enter them here.
Number of hours: Enter the number of hours you expect to need the JupyterLab server. Make sure this aligns with the partition's time limit you'll select later.
Partition: Use the dropdown to select the type of computational resources you need. Choose based on your job's requirements and the time it may take to complete. You can use the information on the Hardware page in the docs to help make your decision.
Number of GPUs: If your job will be using GPU resources, specify the number of GPUs required here.
Number of CPUs: Enter the number of CPU cores you need for your computation.
Memory per CPU (GB): Specify the amount of RAM you need per CPU core in gigabytes.
module load scipy/1.5.0 numpy/1.19.0
--no-browser
4
medium
0
4
8
This configuration is suitable for medium-scale data processing tasks that require specific versions of SciPy and NumPy but do not need GPU acceleration. The job is expected to complete within 4 hours, and it will use 4 CPU cores, each with 8 GB of RAM.
module load tensorflow-gpu/2.4.0
--no-browser
6
amperenodes
2
8
16
This configuration is aimed at machine learning tasks that require TensorFlow with GPU support. The job is expected to run for up to 6 hours. It will use 2 GPUs from the Ampere architecture and 8 CPU cores, each with 16 GB of RAM.
By filling out the form with settings appropriate to your computational needs, you can efficiently utilize the HPC resources available to you.
short
partition.medium
but with a longer time allowance.largemem
, designed for jobs that both require a lot of memory and take a long time to complete.pascalnode
, but designed for jobs that may need more time to complete.amperenodes
, designed for jobs that require more time but use Ampere architecture GPUs.Pascal Architecture: Introduced in 2016 and built on a 16nm FinFET process, Pascal GPUs are the Tesla P100 were aimed at general-purpose computing, gaming, and early machine learning applications. While they offered significant improvements over their predecessor, Maxwell, they generally lack specialized Tensor Cores for AI and do not support real-time ray tracing.
Ampere Architecture: Launched in 2020 on an 8nm process, Ampere GPUs are the A100 Tensor Core are designed for modern computational needs, offering substantial gains in performance and power efficiency. They feature faster GDDR6 or HBM2 memory, specialized Tensor Cores for AI tasks, and native support for real-time ray tracing, making them more versatile for current and future applications.
Node: A standalone unit within a larger computer cluster, equipped with its own memory and processing capabilities. Nodes perform individual tasks and can communicate with other nodes in the network.
Core: A sub-unit within a CPU or GPU that handles specific tasks. Multiple cores can operate simultaneously to execute various tasks, improving overall performance.
Die: The physical piece of silicon that serves as the base for components like cores, memory caches, and other internal structures. It's essentially the platform that holds the computational elements of a CPU or GPU.
To allocate for specific resources for example a A100 GPU with 80 GB VRAM, 125 GB RAM, and 8 vCPUs for a Jupyter Lab AI project on Cheaha, you'd typically follow these steps:
amperenodes
or amperenodes-medium
from the "Partition" dropdown.1
in the "Number of GPUs" field, as you want a single A100 GPU.8
in the "Number of CPUs" field, to request 8 vCPUs.15.625
(or the closest allowable value) in the "Memory per CPU (GB)" field. Since you want 125 GB RAM in total and you have 8 vCPUs, each CPU should ideally have 125/8 = 15.625 GB.module load tensorflow-gpu/2.4.0
.--port=8889
.Here's how you might fill out the form:
module load tensorflow-gpu/2.4.0
--port=8889
6
(or as needed)amperenodes
1
8
15.625
Once submitted, Cheaha will allocate the resources based on availability and queue priority, and your Jupyter Lab session should start with the resources you requested.
In high-performance computing (HPC) environments like Cheaha, efficient resource allocation is crucial. To help manage this, Cheaha employs a resource management system known as Slurm. Slurm essentially acts as a traffic cop, directing jobs to various computational resources based on a set of rules and policies. These policies are encapsulated in what is referred to as Quality of Service or QoS Restrictions.
The concept of Quality of Service (QoS) Restrictions is vital for maintaining a harmonious multi-user environment. In simple terms, QoS Restrictions set the boundaries for resource utilization—be it cores, memory, or GPUs—by each job submitted to the cluster. These restrictions ensure that resources are allocated fairly among all users, preventing any single job from monopolizing the system. But bear in mind that QoS limits are not a reservation of resources; they are more like guidelines that govern the maximum usage per user or job, helping to keep the system both available and equitable.
Now, let's delve into the table that follows. It outlines the various computational resources on Cheaha and the corresponding QoS Restrictions. The table also categorizes resources into Slurm partitions—a Slurm partition being essentially a collection of nodes with similar characteristics and constraints. Partitions have their own QoS limits on aspects like cores, memory, and GPUs, and these limits are applied to each partition independently. Additionally, each researcher is individually subject to these limits, offering a level playing field for all.
With this background, you should find it easier to navigate Cheaha's computational landscape. Below are some practical examples to further elucidate how to interpret and make the most of the resource table.
What would you like to see added?
~
character usage.At https://docs.rc.uab.edu/cheaha/hardware/