sympa-community / sympa

Sympa, Mailing List Management Software
https://www.sympa.community/sympa
GNU General Public License v2.0
243 stars 96 forks source link

hundred_percent_error triggered by single failed message #1597

Open dpc22 opened 1 year ago

dpc22 commented 1 year ago

Expected Behavior

Sympa shouldn't start sending messages of the form:

Subject: Listmaster: list LISTNAME@lists.cam.ac.uk at 100 percents error
Date: Tue, 21 Feb 2023 11:23:46 +0000
From: SYMPA  <sympa@lists.cam.ac.uk>
To: Listmaster  <listmaster@lists.cam.ac.uk>

The list LISTNAME@lists.cam.ac.uk has 100 percents of its users in
error. Something unusual must have happened.

The user (SENDER), who tried to send a mail to this list,
has been warned, as well as the list owners.

See the logs for more details.

Check the bounces in this list:
https://lists.cam.ac.uk/sympa/reviewbouncing/LISTNAME

because of a single failed message rather than a repeating pattern.

(To all of the sender, list owners and listmaster for every subsequent message!)

Current Behavior

At the moment a single message which fails for all subscribers of a list pushes that list into "hundred_percent_error" state which the list owner or listmaster then needs to clear by hand.

Possible Solution

Either watch for repeated failures before switching a list to hundred_percent_error state, or clear the error state automatically if a subsequent delivery works without generating any errors.

Or maybe only generate a hundred_percent_error warnings for a given message when Sympa receives bounces for every subscriber on the list for that message. That is the condition that it is trying to report.

I appreciate that none of these things are simple to implement, which probably explains the current behaviour.

Context

Our Exchange Online tenancy helpfully rejected all of the recipients on a single message because of an (undocumented?) internal limit on that system:

554 5.6.211 Invalid MIME Content: Single text value size (32826) exceeded allowed maximum (32768) for the 'Authentication-Results-Original' header.

This pushed the list in question into hundred_percent_error state.

Further messages to the list generated a lot of spurious messages to senders, list owners and listmaster until I cleared the bounce state by hand. There weren't actually any problems with subsequent messages so:

The user (SENDER), who tried to send a mail to this list,
has been warned, as well as the list owners.

wasn't actually useful and just confused people.

racke commented 1 year ago

Yes, I also think that multiple notifications are annoying and not useful.

ikedas commented 1 year ago

@dpc22, the first of possible solutions you suggested has been discussed on #1412 . How about continuing there?

dpc22 commented 1 year ago

Exchange Online thought that it would be fun to bounce most incoming email for 5 hours yesterday afternoon. At least for our tenancy, but I gather that it was a larger problem

We now have lots of lists with a high bounce rate, and quite a few in hundred_percent_error, despite the fact they are all now working again. I decided that the simplest solution was to just comment out the code that spams everyone.

/usr/share/sympa/lib/Sympa/Spindle/ToList.pm:

        # Bounce rate.
        my $rate = $list->get_total_bouncing() * 100 / $total;
        if (0 and $rate > $list->{'admin'}{'bounce'}{'warn_rate'}) { # XXX DPC
            $list->send_notify_to_owner('bounce_rate', {'rate' => $rate});
            if (100 <= $rate) {
                Sympa::send_notify_to_user($list, 'hundred_percent_error',
                    $message->{sender});
                Sympa::send_notify_to_listmaster($list,
                    'hundred_percent_error', {sender => $message->{sender}});
            }
        }

I don't suppose there is an easy way to reset $list->get_total_bouncing() for all lists?

dpc22 commented 1 year ago
 bounce_address_subscriber     | character varying(100) | 
 bounce_score_subscriber       | integer                | 
 bounce_subscriber             | character varying(35)  | 

The following bit of SQL looks plausible:

UPDATE subscriber_table SET bounce_subscriber=NULL where bounce_subscriber IS NOT NULL;

However I don't know if that is likely to cause bad side effects with bounce_address_subscriber and bounce_score_subscriber

dpc22 commented 1 year ago

Okay, it looks like the expire_bounce task should clean everything up automatically in 10 days time.

I can wait that long and then decide if I want to re-enable the bounce rate notifications.