Hardware table improvement

uabrc / uabrc.github.io

UAB Research Computing Documentation

https://docs.rc.uab.edu

21 stars 12 forks source link

Hardware table improvement #592

Open wwarriner opened 1 year ago

wwarriner commented 1 year ago

What would you like to see added?

[ ] Use multilevel columns for resource columns
- Primary header: "Node"
- Secondary headers: "Availability" | "Limit"
[ ] Carefully and clearly document the ~ character usage.
[ ] Ensure the table is not too wide.
[ ] Separate table with full hardware details for active only

At https://docs.rc.uab.edu/cheaha/hardware/

jgordini commented 10 months ago

Cheaha HPC Cluster

Partition	Time Limit in Hours	Nodes (Limit/Partition)	Cores/Node (Limit/Person)	Mem GB/Node (Limit/Person)	GPU/Node (Limit/Person)
express	2.0	51 (~)	48 (264)	754 (3072)
short	12.0	51 (44)	48 (264)	754 (3072)
medium	50.0	51 (44)	48 (264)	754 (3072)
long	150.0	51 (5)	48 (264)	754 (3072)
largemem	50.0	13 (10)	24 (290)	755 (7168)
largemem-long	150.0	5 (10)	24 (290)	755 (7168)
pascalnode	12.0	18 (~)	28 (56)	252 (500)	4 (8)
pascalnodes-medium	48.0	7 (~)	28 (56)	252 (500)	4 (8)
amperenodes	12.0	20 (TBD)	32 (64)	189 (384)	2 (4)
amperenodes-medium	48.0	20 (TBD)	32 (64)	189 (384)	2 (4)
amd-hdr100	150.0	34 (5)	128 (264)	504 (3072)
Interactive
Intel DCB

jgordini commented 10 months ago

Detailed Hardware Overview

CPU GPU Generation	Compute Type	Die Name	Gpu Name	Gb Mem Node	Gpu Mem Gb	Gpu Per Node	Total Nodes	Total Gpus	Total Cores	Total Memory Gb	Cores Per Node	Cores Per Die	Dies Per Node	Die Frequency Ghz
1	cpu: amd	AMD Opteron 242		16			64		128	1024	2	1	2	1.6
10	cpu: amd	AMD Epyc 7713 Milan		512			34		4352	17408	128	64	2	2
1	cpu: intel	Intel Xeon Gold 6248R		192			5		240	960	48	12	4	3
1	cpu: intel	Intel Xeon Gold 6248R		192			3		144	576	48	12	4	3
8	cpu: intel	Intel Xeon E5-2680 v4		192			21		504	4032	24	12	2	2.5
2	cpu: intel	Intel Xeon E5450		48			24		192	1152	8	4	2	3
3	cpu: intel	Intel Xeon X5650		48			32		384	1536	12	6	2	2.66
3	cpu: intel	Intel Xeon X5650		96			16		192	1536	12	6	2	2.66
4	cpu: intel	Intel Xeon X5650		384			3		48	1152	16	8	2	2.7
5	cpu: intel	Intel Xeon E2650		96			12		192	1152	16	8	2	2
6	cpu: intel	Intel Xeon E5-2680 v3		384			14		336	5376	24	12	2	2.5
6	cpu: intel	Intel Xeon E5-2680 v3		256			38		912	9728	24	12	2	2.5
6	cpu: intel	Intel Xeon E5-2680 v3		128			44		1056	5632	24	12	2	2.5
9	cpu: intel	Intel Xeon Gold 6248R		768			52		2496	39936	48	24	2	3
8	mem: large	Intel Xeon E5-2680 v4		768			10		240	7680	24	12	2	2.5
8	mem: large	Intel Xeon E5-2680 v4		1536			4		96	6144	24	12	2	2.5
7	gpu: pascal	Intel Xeon E5-2680 v4	NVIDIA Tesla P100	256	16.0	4.0	18	72.0	504	4608	28	14	2	2.4
11	gpu: ampere	AMD Epyc 7742 Rome	NVIDIA A100	1024	40.0	8.0	4	32.0	512	4096	128	64	2	2.25
11	gpu: ampere	AMD Epyc 7742 Rome	NVIDIA A100	1024	40.0	8.0	4	32.0	512	4096	128	64	2	2.25
11	gpu: ampere	AMD Epyc 7763 Milan	NVIDIA A100	512	80.0	2.0	20	40.0	2560	10240	128	64	2	2.45

jgordini commented 10 months ago

Sorry I didn't see this ticket before I did this but I updated the tables to make them more readable. Putting the relevant information closer to the left and combining some columns. I also wrote some copy for the Jupyter Lab page to help people fill out the form.

jgordini commented 10 months ago

Launching a JupyterLab server in a High-Performance Computing (HPC) environment. Here's how to fill out each section:

Environment Setup: This section allows you to specify any additional software modules or specific versions of Anaconda that you'd like to load. If you need a particular Python package or software tool, you can list it here. Use the format module load example_module/VERSION.
Extra JupyterLab arguments: If you have any additional command-line arguments to pass to JupyterLab, you can enter them here.
Number of hours: Enter the number of hours you expect to need the JupyterLab server. Make sure this aligns with the partition's time limit you'll select later.
Partition: Use the dropdown to select the type of computational resources you need. Choose based on your job's requirements and the time it may take to complete. You can use the information on the Hardware page in the docs to help make your decision.
Number of GPUs: If your job will be using GPU resources, specify the number of GPUs required here.
Number of CPUs: Enter the number of CPU cores you need for your computation.
Memory per CPU (GB): Specify the amount of RAM you need per CPU core in gigabytes.

Example Use Cases

Use Case 1: CPU-Only

Environment Setup: module load scipy/1.5.0 numpy/1.19.0
Extra JupyterLab arguments OPTIONAL: --no-browser
Number of hours: 4
Partition: medium
Number of GPUs: 0
Number of CPUs: 4
Memory per CPU (GB): 8

This configuration is suitable for medium-scale data processing tasks that require specific versions of SciPy and NumPy but do not need GPU acceleration. The job is expected to complete within 4 hours, and it will use 4 CPU cores, each with 8 GB of RAM.

Use Case 2: CPU and GPU

Environment Setup: module load tensorflow-gpu/2.4.0
Extra JupyterLab arguments OPTIONAL: --no-browser
Number of hours: 6
Partition: amperenodes
Number of GPUs: 2
Number of CPUs: 8
Memory per CPU (GB): 16

This configuration is aimed at machine learning tasks that require TensorFlow with GPU support. The job is expected to run for up to 6 hours. It will use 2 GPUs from the Ampere architecture and 8 CPU cores, each with 16 GB of RAM.

By filling out the form with settings appropriate to your computational needs, you can efficiently utilize the HPC resources available to you.

jgordini commented 10 months ago

Computational Partitions

express: This is a partition designed for quick, small-scale jobs. The time limit is often very short (e.g., 2 hours), but the jobs get scheduled quickly.
short: This partition is meant for jobs that are expected to complete in a relatively short time (e.g., 12 hours). It typically has moderate resource limits.
medium: Designed for jobs that need more time to complete (e.g., 50 hours), but not as much as long-running jobs. Resources are often similar to the short partition.
long: This partition is for long-running jobs that may need up to several days to complete (e.g., 150 hours). The resource allocation might be similar to medium but with a longer time allowance.
largemem: This partition is specialized for jobs that require a large amount of memory (RAM). The time and core limits may vary, but the focus is on providing more memory.
largemem-long: A specialized version of largemem, designed for jobs that both require a lot of memory and take a long time to complete.

GPU Partitions

pascalnode: This partition is for specialized for jobs that require Pascal architecture GPUs. The QoS limits would focus on GPU availability.
pascalnodes-medium: Similar to pascalnode, but designed for jobs that may need more time to complete.
amperenodes: This partition is for specialized for jobs requiring Ampere architecture GPUs.
amperenodes-medium: An extension of amperenodes, designed for jobs that require more time but use Ampere architecture GPUs.

Specialized Hardware

amd-hdr100: HDR 100 is a key enabler of HPC and AI workloads, which are increasingly data-intensive and require high-speed communication between compute nodes. It can significantly improve the performance of HPC applications, such as computational fluid dynamics (CFD), molecular dynamics, and machine learning.

Interactive and Miscellaneous

Interactive: A partition designed for interactive sessions rather than batch jobs. Useful for development, debugging, or data analysis in real-time.
Intel DCB: This could refer to a partition optimized for Intel's Data Center Blocks (DCB), which are fully-validated server systems that can help accelerate time to market with reliable, pre-configured server solutions.

Pascal and Ampere Architecture

Pascal Architecture: Introduced in 2016 and built on a 16nm FinFET process, Pascal GPUs are the Tesla P100 were aimed at general-purpose computing, gaming, and early machine learning applications. While they offered significant improvements over their predecessor, Maxwell, they generally lack specialized Tensor Cores for AI and do not support real-time ray tracing.

Ampere Architecture: Launched in 2020 on an 8nm process, Ampere GPUs are the A100 Tensor Core are designed for modern computational needs, offering substantial gains in performance and power efficiency. They feature faster GDDR6 or HBM2 memory, specialized Tensor Cores for AI tasks, and native support for real-time ray tracing, making them more versatile for current and future applications.

jgordini commented 10 months ago

Node: A standalone unit within a larger computer cluster, equipped with its own memory and processing capabilities. Nodes perform individual tasks and can communicate with other nodes in the network.

Core: A sub-unit within a CPU or GPU that handles specific tasks. Multiple cores can operate simultaneously to execute various tasks, improving overall performance.

Die: The physical piece of silicon that serves as the base for components like cores, memory caches, and other internal structures. It's essentially the platform that holds the computational elements of a CPU or GPU.

jgordini commented 10 months ago

To allocate for specific resources for example a A100 GPU with 80 GB VRAM, 125 GB RAM, and 8 vCPUs for a Jupyter Lab AI project on Cheaha, you'd typically follow these steps:

Partition Selection: Choose a partition that offers A100 GPUs and meets your memory and CPU requirements.Jupyter Lab form select something like amperenodes or amperenodes-medium from the "Partition" dropdown.
Number of GPUs: Enter 1 in the "Number of GPUs" field, as you want a single A100 GPU.
Number of CPUs: Enter 8 in the "Number of CPUs" field, to request 8 vCPUs.
Memory per CPU: Enter 15.625 (or the closest allowable value) in the "Memory per CPU (GB)" field. Since you want 125 GB RAM in total and you have 8 vCPUs, each CPU should ideally have 125/8 = 15.625 GB.
Number of Hours: Enter the estimated time you think your Jupyter Lab session will need to complete your AI project.
Environment Setup: If your project needs specific Python libraries or environments, specify them under "Environment Setup". For AI projects, you might load a module like module load tensorflow-gpu/2.4.0.
Extra JupyterLab Arguments: If you have any special JupyterLab settings, specify them here. For example, you can specify a custom port using --port=8889.
Submit: Finally, submit the form to allocate the resources and start your JupyterLab session.

Here's how you might fill out the form:

Environment Setup: module load tensorflow-gpu/2.4.0
Extra JupyterLab arguments: --port=8889
Number of hours: 6 (or as needed)
Partition: amperenodes
Number of GPUs: 1
Number of CPUs: 8
Memory per CPU (GB): 15.625

Once submitted, Cheaha will allocate the resources based on availability and queue priority, and your Jupyter Lab session should start with the resources you requested.

jgordini commented 10 months ago

Introduction to the Hardware Section

In high-performance computing (HPC) environments like Cheaha, efficient resource allocation is crucial. To help manage this, Cheaha employs a resource management system known as Slurm. Slurm essentially acts as a traffic cop, directing jobs to various computational resources based on a set of rules and policies. These policies are encapsulated in what is referred to as Quality of Service or QoS Restrictions.

The concept of Quality of Service (QoS) Restrictions is vital for maintaining a harmonious multi-user environment. In simple terms, QoS Restrictions set the boundaries for resource utilization—be it cores, memory, or GPUs—by each job submitted to the cluster. These restrictions ensure that resources are allocated fairly among all users, preventing any single job from monopolizing the system. But bear in mind that QoS limits are not a reservation of resources; they are more like guidelines that govern the maximum usage per user or job, helping to keep the system both available and equitable.

Now, let's delve into the table that follows. It outlines the various computational resources on Cheaha and the corresponding QoS Restrictions. The table also categorizes resources into Slurm partitions—a Slurm partition being essentially a collection of nodes with similar characteristics and constraints. Partitions have their own QoS limits on aspects like cores, memory, and GPUs, and these limits are applied to each partition independently. Additionally, each researcher is individually subject to these limits, offering a level playing field for all.

With this background, you should find it easier to navigate Cheaha's computational landscape. Below are some practical examples to further elucidate how to interpret and make the most of the resource table.