snakemake / snakemake-executor-plugin-slurm

A Snakemake executor plugin for submitting jobs to a SLURM cluster
MIT License
18 stars 19 forks source link

Respect GPU resource specifications #172

Open vadim0x60 opened 1 day ago

vadim0x60 commented 1 day ago

Snakemake supports specification of required GPU resources in the Snakefile, i.e.

resources:
   nvidia_gpu: 1

Before this patch, slurm executor ignored these specifications and unless the user manually made sure this doesn't happen, the jobs would run on CPU nodes. This is relatively easy to fix, because like Snakemake, SLURM supports per-job GPU resource specification. This patch ensures that GPU requirements from the Snakefile are relayed to SLURM via

sbatch --gres:gpu

Summary by CodeRabbit

coderabbitai[bot] commented 1 day ago

Walkthrough

The changes involve modifications to the Executor class in the snakemake_executor_plugin_slurm/__init__.py file. The updates enhance job submission logic to support GPU resources by checking for gpu and nvidia_gpu keys and adjusting the submission command accordingly. Additionally, the logic for setting the number of tasks for SLURM jobs has been updated to ensure compliance with SLURM version 22.05, which requires the --ntasks option for all submissions. Error handling during job submission has also been improved for better clarity on failures.

Changes

File Change Summary
snakemake_executor_plugin_slurm/__init__.py Enhanced job submission logic for GPU resources, updated task settings for SLURM compliance, and refined error handling.

Sequence Diagram(s)

sequenceDiagram
    participant Job as JobExecutorInterface
    participant Executor as Executor
    participant SLURM as SLURM System

    Job->>Executor: Submit Job
    Executor->>Executor: Check for GPU resources
    alt GPU resources found
        Executor->>SLURM: Submit with --gres=gpu:<count>
    else No GPU resources
        Executor->>SLURM: Submit without GPU
    end
    SLURM-->>Executor: Job ID
    Executor-->>Job: Return Job ID or error

🐇 "In the land of SLURM, where jobs take flight,
With GPUs added, they shine so bright.
Tasks now aligned, with options galore,
Error messages clearer, we can explore!
Hopping through changes, we celebrate cheer,
For the world of computing, we hold dear!" 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share - [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)
🪧 Tips ### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit , please review it.` - `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit testing code for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.` - `@coderabbitai read src/utils.ts and generate unit testing code.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` - `@coderabbitai help me debug CodeRabbit configuration file.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (Invoked using PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai full review` to do a full review from scratch and review all the files again. - `@coderabbitai summary` to regenerate the summary of the PR. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository. - `@coderabbitai help` to get help. ### Other keywords and placeholders - Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. - Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description. - Add `@coderabbitai` anywhere in the PR title to generate the title automatically. ### Documentation and Community - Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit. - Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.
cmeesters commented 17 hours ago

Thank you for this PR. There is some misunderstanding:

resources:
        slurm_extra="'--gres:gpu:1'"

I will only approve this particular PR, if it becomes a) generic (dropped nvidia specialities) and b) supports a --slurm-... flag and c) reflects the changes in the docs. As this is easy enough: Shall I do a new PR, or will you refactor yours?