fix: sbatch stderr parsing

cmeesters commented 3 weeks ago

will hopefully fix #157

The issue is, that submission joined stderr and stdout of the sbatch call. Without add-ons sbatch only emits to stdout and to stderr only in the case of an error. However, admins can add informative messages to stderr, when this occurs, parsing the message for the JobID failed. Now, stderr and stdout are considered separately.

Summary by CodeRabbit

New Features
- Enhanced error handling during SLURM job submission, providing clearer feedback on failures.
- Improved job ID retrieval by stripping whitespace from the output.
Bug Fixes
- Addressed issues with job submission failures by capturing both standard output and error messages.
Chores
- Minor adjustments to logging for better clarity during job submission and error reporting.

coderabbitai[bot] commented 3 weeks ago

Walkthrough

The changes in this pull request focus on enhancing the run_job method within the Executor class of the snakemake_executor_plugin_slurm module. Key modifications include replacing subprocess.check_output with subprocess.Popen to improve error handling during SLURM job submissions. The new implementation captures both standard output and error, raising a WorkflowError for failures. Additionally, whitespace is stripped from the job ID output, and logging statements have been adjusted for clarity.

Changes

File Path	Change Summary
snakemake_executor_plugin_slurm/init.py	Modified `run_job` method: replaced `subprocess.check_output` with `subprocess.Popen`, improved error handling, captured standard output and error, stripped whitespace from job ID, adjusted logging statements.

Assessment against linked issues

Objective	Addressed	Explanation
Improve job submission handling to prevent hangs (#157)	✅
Enhance error reporting during job submission (#157)	✅

Possibly related PRs

#136: Modifies the run_job method to add a requeue option, relevant to job submission error handling.
#140: Enhances error handling in the run_job method, aligning with robustness improvements in this PR.
#153: Discusses the --slurm_requeue option, related to job submission and error handling changes in this PR.

Suggested reviewers

cmeesters
johanneskoester

Poem

🐰 In the meadow where jobs do run,
SLURM's whispers now sound like fun.
With errors caught and logs so bright,
Snakemake dances in the moonlight!
No more hangs, just smooth delight,
A hop, a skip, all workflows take flight! 🌙✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

🪧 Tips

### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit , please review it.` - `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit testing code for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.` - `@coderabbitai read src/utils.ts and generate unit testing code.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` - `@coderabbitai help me debug CodeRabbit configuration file.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (Invoked using PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai full review` to do a full review from scratch and review all the files again. - `@coderabbitai summary` to regenerate the summary of the PR. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository. - `@coderabbitai help` to get help. ### Other keywords and placeholders - Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. - Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description. - Add `@coderabbitai` anywhere in the PR title to generate the title automatically. ### Documentation and Community - Visit our [Documentation](https://coderabbit.ai/docs) for detailed information on how to use CodeRabbit. - Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.

cmeesters commented 3 weeks ago

@johanneskoester care to have a look?

@fgvieira can you test this on your cluster, too?

If either one of you can test this, it would be great. I do not trust the CI enough to consider this a mercyless test. Also, I would like to have feedback from @freekvh, how this code behaves on Snellius.

freekvh commented 2 weeks ago

Hi @cmeesters I'm very much willing to test (on Snellius), but I am not getting this PR to work with poetry (see my comment here: https://github.com/snakemake/snakemake-executor-plugin-slurm/issues/157#issuecomment-2450822608) If you can advice how I can do this some other way, or fix the datrie-package related issues, I'd be happy to take another look.

freekvh commented 2 weeks ago

I tested this on Snellius, I had issues, I added the details to our original issue: #157 , see https://github.com/snakemake/snakemake-executor-plugin-slurm/issues/157#issuecomment-2454899674

snakemake / snakemake-executor-plugin-slurm