uabrc / uabrc.github.io

UAB Research Computing Documentation
https://docs.rc.uab.edu
21 stars 12 forks source link

`facilities.txt` grant boilerplate is out of date #356

Closed wwarriner closed 1 year ago

wwarriner commented 2 years ago

File docs/grants/res/facilities.txt is out of date as it mentions the DDN12K, which has been removed.

wwarriner commented 2 years ago

What Ralph uses, also needs updating. Just putting it here until we can find a more complete home.

Description of Research Computing System Resources

High Performance Computing: Cheaha is a campus compute resource dedicated to enhancing research computing productivity at UAB. Cheaha supports high performance computing (HPC) and high throughput computing (HTC) paradigms. Cheaha provides users with both a web-based interface, via open OnDemand, and a traditional command-line interactive environment, via SSH. These interfaces provide access to many scientific tools that can leverage a dedicated pool of local compute resources via the SLURM batch scheduler. Cheaha was validated for HIPAA alignment by a third-party external audit. Overall, Cheaha has 8192 cores, 70 TBs of memory, and 72 P100 GPUs. In addition, UAB researchers also have access to regional and national HPC resources such as Alabama Supercomputer Authority (ASA), XSEDE and Open Science Grid (OSG).

UAB Research Cloud Resources: Research Computing operate a production OpenStack cloud resource since 2019. This fabric is composed of five Dell R640 48 core 192G RAM compute nodes for 240 cores and 960GB of standard cloud compute resources. In addition, the fabric will feature four NVIDIA DGX A100 nodes that include 8 A100 GPUs and 1TB of RAM each. These resources are available to the research community for provisioning on demand via the OpenStack services (Ussuri release). This resource supports researchers making their hosted services available beyond campus while adhering to standard campus network security practices.

Storage: The compute nodes on Cheaha are backed by high performance, 6.6PB GPFS raw storage on DDN SFA14KX hardware connected via an EDR /FDR InfiniBand fabric. The non-scratch files on the GPFS cluster are replicated to 6.0PB raw storage on a DDN SFA12KX located in the RUST data center to provide site redundancy.

Two new storage fabrics came on line in 2021. These are Ceph object stores with different hardware configurations to address different usage scenarios. The fabrics are a 6.9PB archive storage fabric built using 12 Dell DSS7500 nodes, an expanded 1.3PB nearline storage fabric built with 14 Dell 740xd nodes, and a 248TB SSD cache storage fabric built with 8 Dell 840 nodes. Networking: The UAB Research Network is currently a dedicated 40GE optical connection between the UAB Shared HPC Facility in 936 and the RUST Campus Data Center to create a multi-site facility housing the Research Computing System (RCS). This network is being upgraded in 2021 to replace aging equipment and extend service to the DC BLOX data center. The new network provides a 200Gbs Ethernet backbone for East-West traffic for connecting storage and compute hosting resources. The network supports direct connection to campus and high-bandwidth regional networks via 40Gbps Globus Data Transfer Nodes (DTNs) providing the capability to connect data intensive research facilities directly with the high-performance computing and storage services of the Research Computing System. This network can support very high-speed secure connectivity between nodes connected to it for high-speed file transfer of very large data sets without the concerns of interfering with other traffic on the campus backbone, ensuring predictable latencies. The Science DMZ interface with (DTNs) includes Perfsonar measurement nodes and a Bro security node connected directly to the border router that provide a "friction-free" pathway to access external data repositories as well as computational resources.

The campus network backbone is based on a 40-gigabit redundant Ethernet network with 480 gigabit/second backplanes on the core L2/L3 Switch/Routers. For efficient management, a collapsed backbone design is used. Each campus building is connected using 10 Gigabit Ethernet links over single mode optical fiber. Desktops are connected at 1 gigabits/second speed. The campus wireless network blankets classrooms, common areas and most academic office buildings.

UAB connects to the Internet2 high-speed research network via the University of Alabama System Regional Optical Network (UASRON), a University of Alabama System owned and operated DWDM Network offering 100Gbps Ethernet to the Southern Light Rail (SLR)/Southern Crossroads (SoX) in Atlanta, Ga. The UASRON also connects UAB to UA, and UAH, the other two University of Alabama System institutions, and the Alabama Supercomputer Center. UAB is also connected to other universities and schools through Alabama Research and Education Network (AREN).

Personnel: UAB IT Research Computing currently maintains a support staff of 14 plus six half-time student interns lead by the Assistant Vice President for Research Computing and includes an HPC Architect-Manager and a Data Science-Manager, four Software Developers, three Data Scientists, three System Engineers and a Project Coordinator. Note, RC personnel also complete HIPAA and Responsible Conduct of Research training.