neilmunday / slurm-mail

Slurm-Mail is a drop in replacement for Slurm's e-mails to give users much more information about their jobs compared to the standard Slurm e-mails.
GNU General Public License v3.0
97 stars 39 forks source link

Add `retryOnFailure` option #112

Closed thgeorgiou closed 9 months ago

thgeorgiou commented 9 months ago

Hello!

I made a small change to allow the administrator to control if whether they want to try sending a mail if there is a failure.

In my case, sometimes my users either miss-typed their emails, or they copied the job template that had an example/invalid address. This caused files to remain in /var/spool/slurm-mail forever and the program would retry sending every minute, eventually getting my cluster banned from the mail server. Setting retryOnFailure to 0 always deletes the mail files after an attempted send.

I am catching all smtplib exceptions (SMTPHeloError, SMTPRecipientsRefused, SMTPSenderRefused, SMTPNotSupportedError) but perhaps SMTPHeloError shouldn't be caught, since it most probably represents a connection error rather than an invalid recipient.

Thank you for your work on this very useful tool!

github-actions[bot] commented 9 months ago

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  src/slurmmail
  cli.py 740-751
Project Total  

This report was generated by python-coverage-comment-action

neilmunday commented 9 months ago

Thanks for this feature @thgeorgiou - very useful.

I will add this to the 4.11 release together with how to use the option in the README.

Glad you like Slurm-Mail :-)

neilmunday commented 9 months ago

Version 4.11 released.