ucphhpc / migrid-sync

MiGrid workspace where master branch is kept strictly in sync with SF upstream svn repo. Any development or experiments should use a branch. You probably want to fork your own clone or work e.g. on the edge branch if you wish to contribute.
GNU General Public License v2.0
3 stars 3 forks source link

Wishlist/feature request - reduce number of emails sent to users #36

Open aputtu opened 3 months ago

aputtu commented 3 months ago

We receive reports (id 32634) on occasions, where users get overwhelmed by the number of emails received. There seems to be a use for:

  1. Alerts regarding security issues.
  2. Warnings regarding various areas of concern.
  3. Information on file changes and similar functional usage.

In addition there are the type of emails, where e.g. the system will send out mails when a connection fails or similar. We will want to warn users at time, but (most often) not have them receiving excessive amount of identical warnings.

Questions to raise:

Not sure which part of the issues that belong to Migrid development and which part that belongs to server administrator.

jonasbardino commented 2 months ago

I'm pretty sure 99% of these reports are about the built in notifications system warning about valid issues like repeated failed SFTP/WebDAVS login attempts and often the resulting case of hitting the login rate limit. Such emails are typically triggered when a client has set up an SFTP or WebDAVS network drive with automatic retry - and it keeps trying login despite repeat errors. Common service login error causes include when main user account expired (ERDA/SIF FAQ) or when the required twofactor session expired. The latter is mandatory for GDP sites like SIF and optional on general sites like ERDA.

The notification system already does internal batching to avoid sending an email on every error, but if the client keeps failing login it will result in additional emails every once in a while until the client stops hammering. This batching window may be adjusted but a number of these messages are about e.g. expired main account due to lack of web activity, so it does not make sense to inform about it on web. We really only can email or completely suspend the account to get user attention in such cases.

Apart from perhaps improving the actual warnings emails and related documentation I think @Rehr is best acquainted with the notification code and any frequency adjustments.

Martin-Rehr commented 2 months ago

The notification interval is hard coded to 60 seconds and was chosen to give users a quick response upon failed logins. Similar errors are batched within the interval.

We could extend the system to suppress similar errors that occurs across notification intervals but I'll say this goes at the end of the nice-to-have list, since users are only bothered if one of their own clients keeps hammering on the system.

Last but not least we could use fail2ban to block users with clients "on the loose".