HPC Dashboard
Powerful monitoring for your SLURM-based HPC cluster
The HPC Dashboard is a Next.js application designed to provide comprehensive monitoring of SLURM nodes. With a focus on performance and usability, this dashboard offers real-time insights into your HPC resources.
Key Features
Core Functionality
- Real-time monitoring of CPU and GPU node utilization
- Detailed individual node status
- Comprehensive Slurm job details and history
- Dynamic data updates with refresh countdown
Advanced Integrations
Enable these features by configuring your environment file:
- LMOD module display and details
- Prometheus metrics integration
- OpenAI-powered insights
Quick Start
git clone https://github.com/thediymaker/slurm-node-dashboard.git
cd slurm-node-dashboard
npm install
# Set up your .env file (see Configuration section)
npm run dev
Visit http://localhost:3000
to see your dashboard in action.
Detailed Setup
Prerequisites
- Node.js (v18 or later)
- npm or Yarn
- PM2 (for production deployment)
- Slurm API (enabled and configured)
- Slurm API token
Enabling the Slurm API
To use this dashboard, you need to have the Slurm API enabled on your HPC cluster. Follow these steps to set it up:
1. Start by reviewing the [Schedmd quickstart guide](https://slurm.schedmd.com/rest_quickstart.html).
2. Ensure that `slurmrestd` is running on your cluster.
3. Once the Slurm API is running, you need to generate an API key for authentication.
### Generating an API Key
The API key needs permissions to read all data. Here's an example of generating a key for the slurm user with a lifespan of 1 year:
```bash
scontrol token username=slurm lifespan=31536000
```
Note: This generates a JWT token. You can view the expiration date on the token and set up a reminder to renew it, or automate the renewal process (even with a shorter timeframe). The expiration of this token will be added to the future admin section on the dashboard.
Configuration
Create a `.env` file in the root directory:
```env
COMPANY_NAME="Your Company"
CLUSTER_NAME="Your Cluster"
CLUSTER_LOGO="/path/to/logo.png"
NEXT_PUBLIC_BASE_URL="http://your-domain.com"
# Optional integrations
PROMETHEUS_URL=""
OPENAI_API_KEY=""
# Slurm configuration
SLURM_API_VERSION="v0.0.40"
SLURM_SERVER="http://your-slurm-server:port"
SLURM_API_TOKEN="your-slurm-api-token"
# Development settings
NODE_ENV="production"
REACT_EDITOR="code"
```
Production Deployment
For production environments, we recommend using PM2:
```bash
npm install -g pm2
pm2 start npm --name "hpc-dashboard" -- start
pm2 save
```
This ensures your dashboard runs continuously and restarts automatically if the server reboots.
Advanced Usage
Custom Data Collection
### Historical Node Data
Collect historical node data with this script (run hourly via cron):
```bash
#!/bin/bash
SAVE_DIR="/path/to/data/directory"
mkdir -p "$SAVE_DIR"
FILENAME=$(date +"%Y-%m-%dT%H-%M-%S.000Z.json.gz")
curl -s "http://localhost:3000/api/slurm/nodes" | gzip > "$SAVE_DIR/$FILENAME"
find "$SAVE_DIR" -name "*.json.gz" -type f -mtime +30 -delete
```
### Module Data
Collect module data with this script (run daily via cron):
```bash
#!/bin/bash
json_dir="/path/to/public/directory"
json_output="${json_dir}/modules.json"
mkdir -p "$json_dir"
export MODULESHOME="/usr/share/lmod/lmod"
export MODULEPATH="/your/module/path"
$LMOD_DIR/spider -o jsonSoftwarePage $MODULEPATH | python -m json.tool > "$json_output"
```
Open OnDemand Integration
To integrate this dashboard with Open OnDemand:
Clone the generic Ruby app template:
```
git clone https://github.com/thediymaker/ood-status-iframe.git
```
Navigate to the cloned repository:
```
cd ood-status-iframe
```
Open the views/layout.erb file in your preferred text editor.
Update the URL in the views/layout.erb file to point to your deployed HPC Dashboard:
erb
```
Contributing
We welcome contributions! Here's how you can help:
- Fork the repository
- Create a feature branch:
git checkout -b new-feature
- Make your changes and commit:
git commit -am 'Add new feature'
- Push to the branch:
git push origin new-feature
- Submit a pull request
License
This project is licensed under the GNU General Public License v3.0. See the LICENSE file for details.
Support and Contact
For support, please open an issue on our GitHub repository.
For direct inquiries, contact Johnathan Lee at john.lee@thediymaker.com.
Gallery
Additional Screenshots
| Feature Overview | Job Details |
| :----------------------------------------------: | :--------------------------------------------------: |
| ![Features](/images/new_features_screenshot.png) | ![Job Detail](/images/new_job_detail_screenshot.png) |
| Running Job | Completed Job |
| :----------------------------------------------------: | :--------------------------------------------------------: |
| ![Running Job](/images/new_running_job_screenshot.png) | ![Completed Job](/images/new_completed_job_screenshot.png) |
| Node Hover Details |
| :-----------------------------------------------------: |
| ![Hover Status](/images/new_dashboard_screenshot_2.png) |
Made with ❤️ for HPC