uabrc / uabrc.github.io

UAB Research Computing Documentation
https://docs.rc.uab.edu
21 stars 12 forks source link

Hardware table improvement #592

Open wwarriner opened 1 year ago

wwarriner commented 1 year ago

What would you like to see added?

At https://docs.rc.uab.edu/cheaha/hardware/

jgordini commented 10 months ago

Cheaha HPC Cluster

Partition Time Limit in Hours Nodes (Limit/Partition) Cores/Node (Limit/Person) Mem GB/Node (Limit/Person) GPU/Node (Limit/Person)
express 2.0 51 (~) 48 (264) 754 (3072)
short 12.0 51 (44) 48 (264) 754 (3072)
medium 50.0 51 (44) 48 (264) 754 (3072)
long 150.0 51 (5) 48 (264) 754 (3072)
largemem 50.0 13 (10) 24 (290) 755 (7168)
largemem-long 150.0 5 (10) 24 (290) 755 (7168)
pascalnode 12.0 18 (~) 28 (56) 252 (500) 4 (8)
pascalnodes-medium 48.0 7 (~) 28 (56) 252 (500) 4 (8)
amperenodes 12.0 20 (TBD) 32 (64) 189 (384) 2 (4)
amperenodes-medium 48.0 20 (TBD) 32 (64) 189 (384) 2 (4)
amd-hdr100 150.0 34 (5) 128 (264) 504 (3072)
Interactive
Intel DCB
jgordini commented 10 months ago

Detailed Hardware Overview

CPU GPU Generation Compute Type Die Name Gpu Name Gb Mem Node Gpu Mem Gb Gpu Per Node Total Nodes Total Gpus Total Cores Total Memory Gb Cores Per Node Cores Per Die Dies Per Node Die Frequency Ghz
1 cpu: amd AMD Opteron 242 16 64 128 1024 2 1 2 1.6
10 cpu: amd AMD Epyc 7713 Milan 512 34 4352 17408 128 64 2 2
1 cpu: intel Intel Xeon Gold 6248R 192 5 240 960 48 12 4 3
1 cpu: intel Intel Xeon Gold 6248R 192 3 144 576 48 12 4 3
8 cpu: intel Intel Xeon E5-2680 v4 192 21 504 4032 24 12 2 2.5
2 cpu: intel Intel Xeon E5450 48 24 192 1152 8 4 2 3
3 cpu: intel Intel Xeon X5650 48 32 384 1536 12 6 2 2.66
3 cpu: intel Intel Xeon X5650 96 16 192 1536 12 6 2 2.66
4 cpu: intel Intel Xeon X5650 384 3 48 1152 16 8 2 2.7
5 cpu: intel Intel Xeon E2650 96 12 192 1152 16 8 2 2
6 cpu: intel Intel Xeon E5-2680 v3 384 14 336 5376 24 12 2 2.5
6 cpu: intel Intel Xeon E5-2680 v3 256 38 912 9728 24 12 2 2.5
6 cpu: intel Intel Xeon E5-2680 v3 128 44 1056 5632 24 12 2 2.5
9 cpu: intel Intel Xeon Gold 6248R 768 52 2496 39936 48 24 2 3
8 mem: large Intel Xeon E5-2680 v4 768 10 240 7680 24 12 2 2.5
8 mem: large Intel Xeon E5-2680 v4 1536 4 96 6144 24 12 2 2.5
7 gpu: pascal Intel Xeon E5-2680 v4 NVIDIA Tesla P100 256 16.0 4.0 18 72.0 504 4608 28 14 2 2.4
11 gpu: ampere AMD Epyc 7742 Rome NVIDIA A100 1024 40.0 8.0 4 32.0 512 4096 128 64 2 2.25
11 gpu: ampere AMD Epyc 7742 Rome NVIDIA A100 1024 40.0 8.0 4 32.0 512 4096 128 64 2 2.25
11 gpu: ampere AMD Epyc 7763 Milan NVIDIA A100 512 80.0 2.0 20 40.0 2560 10240 128 64 2 2.45
jgordini commented 10 months ago

Sorry I didn't see this ticket before I did this but I updated the tables to make them more readable. Putting the relevant information closer to the left and combining some columns. I also wrote some copy for the Jupyter Lab page to help people fill out the form.

jgordini commented 10 months ago

Launching a JupyterLab server in a High-Performance Computing (HPC) environment. Here's how to fill out each section:

  1. Environment Setup: This section allows you to specify any additional software modules or specific versions of Anaconda that you'd like to load. If you need a particular Python package or software tool, you can list it here. Use the format module load example_module/VERSION.

  2. Extra JupyterLab arguments: If you have any additional command-line arguments to pass to JupyterLab, you can enter them here.

  3. Number of hours: Enter the number of hours you expect to need the JupyterLab server. Make sure this aligns with the partition's time limit you'll select later.

  4. Partition: Use the dropdown to select the type of computational resources you need. Choose based on your job's requirements and the time it may take to complete. You can use the information on the Hardware page in the docs to help make your decision.

  5. Number of GPUs: If your job will be using GPU resources, specify the number of GPUs required here.

  6. Number of CPUs: Enter the number of CPU cores you need for your computation.

  7. Memory per CPU (GB): Specify the amount of RAM you need per CPU core in gigabytes.

Example Use Cases

Use Case 1: CPU-Only

This configuration is suitable for medium-scale data processing tasks that require specific versions of SciPy and NumPy but do not need GPU acceleration. The job is expected to complete within 4 hours, and it will use 4 CPU cores, each with 8 GB of RAM.

Use Case 2: CPU and GPU

This configuration is aimed at machine learning tasks that require TensorFlow with GPU support. The job is expected to run for up to 6 hours. It will use 2 GPUs from the Ampere architecture and 8 CPU cores, each with 16 GB of RAM.

By filling out the form with settings appropriate to your computational needs, you can efficiently utilize the HPC resources available to you.

jgordini commented 10 months ago

Computational Partitions

  1. express: This is a partition designed for quick, small-scale jobs. The time limit is often very short (e.g., 2 hours), but the jobs get scheduled quickly.
  2. short: This partition is meant for jobs that are expected to complete in a relatively short time (e.g., 12 hours). It typically has moderate resource limits.
  3. medium: Designed for jobs that need more time to complete (e.g., 50 hours), but not as much as long-running jobs. Resources are often similar to the short partition.
  4. long: This partition is for long-running jobs that may need up to several days to complete (e.g., 150 hours). The resource allocation might be similar to medium but with a longer time allowance.
  5. largemem: This partition is specialized for jobs that require a large amount of memory (RAM). The time and core limits may vary, but the focus is on providing more memory.
  6. largemem-long: A specialized version of largemem, designed for jobs that both require a lot of memory and take a long time to complete.

GPU Partitions

  1. pascalnode: This partition is for specialized for jobs that require Pascal architecture GPUs. The QoS limits would focus on GPU availability.
  2. pascalnodes-medium: Similar to pascalnode, but designed for jobs that may need more time to complete.
  3. amperenodes: This partition is for specialized for jobs requiring Ampere architecture GPUs.
  4. amperenodes-medium: An extension of amperenodes, designed for jobs that require more time but use Ampere architecture GPUs.

Specialized Hardware

  1. amd-hdr100: HDR 100 is a key enabler of HPC and AI workloads, which are increasingly data-intensive and require high-speed communication between compute nodes. It can significantly improve the performance of HPC applications, such as computational fluid dynamics (CFD), molecular dynamics, and machine learning.

Interactive and Miscellaneous

  1. Interactive: A partition designed for interactive sessions rather than batch jobs. Useful for development, debugging, or data analysis in real-time.
  2. Intel DCB: This could refer to a partition optimized for Intel's Data Center Blocks (DCB), which are fully-validated server systems that can help accelerate time to market with reliable, pre-configured server solutions.

Pascal and Ampere Architecture

Pascal Architecture: Introduced in 2016 and built on a 16nm FinFET process, Pascal GPUs are the Tesla P100 were aimed at general-purpose computing, gaming, and early machine learning applications. While they offered significant improvements over their predecessor, Maxwell, they generally lack specialized Tensor Cores for AI and do not support real-time ray tracing.

Ampere Architecture: Launched in 2020 on an 8nm process, Ampere GPUs are the A100 Tensor Core are designed for modern computational needs, offering substantial gains in performance and power efficiency. They feature faster GDDR6 or HBM2 memory, specialized Tensor Cores for AI tasks, and native support for real-time ray tracing, making them more versatile for current and future applications.

jgordini commented 10 months ago

Node: A standalone unit within a larger computer cluster, equipped with its own memory and processing capabilities. Nodes perform individual tasks and can communicate with other nodes in the network.

Core: A sub-unit within a CPU or GPU that handles specific tasks. Multiple cores can operate simultaneously to execute various tasks, improving overall performance.

Die: The physical piece of silicon that serves as the base for components like cores, memory caches, and other internal structures. It's essentially the platform that holds the computational elements of a CPU or GPU.

jgordini commented 10 months ago

To allocate for specific resources for example a A100 GPU with 80 GB VRAM, 125 GB RAM, and 8 vCPUs for a Jupyter Lab AI project on Cheaha, you'd typically follow these steps:

  1. Partition Selection: Choose a partition that offers A100 GPUs and meets your memory and CPU requirements.Jupyter Lab form select something like amperenodes or amperenodes-medium from the "Partition" dropdown.
  2. Number of GPUs: Enter 1 in the "Number of GPUs" field, as you want a single A100 GPU.
  3. Number of CPUs: Enter 8 in the "Number of CPUs" field, to request 8 vCPUs.
  4. Memory per CPU: Enter 15.625 (or the closest allowable value) in the "Memory per CPU (GB)" field. Since you want 125 GB RAM in total and you have 8 vCPUs, each CPU should ideally have 125/8 = 15.625 GB.
  5. Number of Hours: Enter the estimated time you think your Jupyter Lab session will need to complete your AI project.
  6. Environment Setup: If your project needs specific Python libraries or environments, specify them under "Environment Setup". For AI projects, you might load a module like module load tensorflow-gpu/2.4.0.
  7. Extra JupyterLab Arguments: If you have any special JupyterLab settings, specify them here. For example, you can specify a custom port using --port=8889.
  8. Submit: Finally, submit the form to allocate the resources and start your JupyterLab session.

Here's how you might fill out the form:

Once submitted, Cheaha will allocate the resources based on availability and queue priority, and your Jupyter Lab session should start with the resources you requested.

jgordini commented 10 months ago

Introduction to the Hardware Section

In high-performance computing (HPC) environments like Cheaha, efficient resource allocation is crucial. To help manage this, Cheaha employs a resource management system known as Slurm. Slurm essentially acts as a traffic cop, directing jobs to various computational resources based on a set of rules and policies. These policies are encapsulated in what is referred to as Quality of Service or QoS Restrictions.

The concept of Quality of Service (QoS) Restrictions is vital for maintaining a harmonious multi-user environment. In simple terms, QoS Restrictions set the boundaries for resource utilization—be it cores, memory, or GPUs—by each job submitted to the cluster. These restrictions ensure that resources are allocated fairly among all users, preventing any single job from monopolizing the system. But bear in mind that QoS limits are not a reservation of resources; they are more like guidelines that govern the maximum usage per user or job, helping to keep the system both available and equitable.

Now, let's delve into the table that follows. It outlines the various computational resources on Cheaha and the corresponding QoS Restrictions. The table also categorizes resources into Slurm partitions—a Slurm partition being essentially a collection of nodes with similar characteristics and constraints. Partitions have their own QoS limits on aspects like cores, memory, and GPUs, and these limits are applied to each partition independently. Additionally, each researcher is individually subject to these limits, offering a level playing field for all.

With this background, you should find it easier to navigate Cheaha's computational landscape. Below are some practical examples to further elucidate how to interpret and make the most of the resource table.