sympa-community / sympa

Sympa, Mailing List Management Software
https://www.sympa.community/sympa
GNU General Public License v2.0
243 stars 97 forks source link

[CVE-2023-24038] Archived crash on Complex regular subexpression recursion limit (65534) ... StripScripts.pm line 1602. #1573

Closed DLalot closed 1 year ago

DLalot commented 1 year ago

Version

6.2.70

Installation method

tar.gz

Expected behavior

Bad mail should be discarded silently when trying to archive

Actual behavior

loops on archiving a bad mail and crash archived, delivering tons of mail to listmaster and blocking the archived process

Steps to reproduce

Just put the attach file in spool-device/outgoing/1672985091.1672985097.211337.communication@xx.fr,1389,8655

Additional information

badmail.txt

ikedas commented 1 year ago

Hi @DLalot , Could you please show these?

DLalot commented 1 year ago

Hi @ikedas The problem has also been pointed in 2016 and David Verdin said that they should catch the error but nothing was done. I'm rather sceptical that's this a problem with the library. and even if it is, we should catch the error. I've sent a PR dpkg -S /usr/share/perl5/HTML/StripScripts.pm libhtml-stripscripts-perl: /usr/share/perl5/HTML/StripScripts.pm dpkg -l libhtml-stripscripts-perl ii libhtml-stripscripts-perl 1.06-1 all module for removing scripts from HTML

Le processus archived.pl précédent (avec le pid 1733432) est mort brutalement. Date du crash : 06 Jan 2023 09:00 Erreurs :

DIED: Complex regular subexpression recursion limit (65534) exceeded at /usr/share/perl5/HTML/StripScripts.pm line 1602.  at /usr/share/perl5/HTML/StripScripts/Parser.pm line 121. HTML::StripScripts::Parser::filter_html(Sympa::HTMLSanitizer=HASH(0x55d9cd038758), '\x{a}

\x{a}

<div ali...') called at /home/sympa/bin/Sympa/HTMLSanitizer.pm line 117 Sympa::HTMLSanitizer::sanitize_html(Sympa::HTMLSanitizer=HASH(0x55d9cd038758), '\x{a}

\x{a}

<div ali...') called at /home/sympa/bin/Sympa/Message.pm line 980 Sympa::Message::_fix_html_part(MIME::Entity=HASH(0x55d9ccc419a0), 'xx.fr') called at /home/sympa/bin/Sympa/Message.pm line 952 Sympa::Message::_fix_html_part(MIME::Entity=HASH(0x55d9ccc446b8), 'xx.fr') called at /home/sympa/bin/Sympa/Message.pm line 936 Sympa::Message::clean_html(Sympa::Message 1672985091.1672985097.211337.communication@xx.fr,1389,8655) called at /home/sympa/bin/Sympa/Archive.pm line 479 Sympa::Archive::html_store(Sympa::Archive communication@xx.fr/2023-01, Sympa::Message 1672985091.1672985097.211337.communication@xx.fr,1389,8655) called at /home/sympa/bin/Sympa/Spindle/ProcessArchive.pm line 464 Sympa::Spindle::ProcessArchive::_mail2arc(Sympa::Message 1672985091.1672985097.211337.communication@xxfr,1389,8655) called at /home/sympa/bin/Sympa/Spindle/ProcessArchive.pm line 116 Sympa::Spindle::ProcessArchive::_twist(Sympa::Spindle::ProcessArchive=HASH(0x55d9ccccb4a0), Sympa::Message 1672985091.1672985097.211337.communication@xx.fr,1389,8655) called at /home/sympa/bin/Sympa/Spindle.pm line 83 Sympa::Spindle::spin(Sympa::Spindle::ProcessArchive=HASH(0x55d9ccccb4a0)) called at /home/sympa/bin/archived.pl line 162

Consultez les logs pour plus de détails.

ikedas commented 1 year ago

Hi @DLalot ,

Hi @ikedas The problem has also been pointed in 2016 and David Verdin said that they should catch the error but nothing was done. I'm rather sceptical that's this a problem with the library. and even if it is, we should catch the error.

I found the dialogue at the time.

The cause appears to be a ReDoS that occurred at the same location on HTML::StripScripts.

I've sent a PR dpkg -S /usr/share/perl5/HTML/StripScripts.pm libhtml-stripscripts-perl: /usr/share/perl5/HTML/StripScripts.pm dpkg -l libhtml-stripscripts-perl ii libhtml-stripscripts-perl 1.06-1 all module for removing scripts from HTML

Thanks for the information. I think the PR you submitted is a useful workaround to the problem, but I will give it some thought to see if there is a way to avoid the problem and still have a successful archiving.

<\<snip>>

DLalot commented 1 year ago

Hi @ikedas I believe it's a good thing anyway to catch the error to avoid blocking the archive process as it was always trying to archive that bad email, and we don't know what could happen in the future. That was the first time I saw sympa archive crashing. I'm using sympa from the very first time, maybe late nineties . It was called tulp at first :-)

ikedas commented 1 year ago

@DLalot ,

Until the mid-2010s, Sympa used to crash so often that it was common practice to catch possible errors and then ignore them to keep the process running (One particularly striking example was wrapping the entire message delivery in eval() and ignoring all failures!).

However, such an approach is actually a way of ignoring the problem instead of solving it, and is not recommended now that Sympa has become stable enough to operate.

In my current PR (see above), I try to understand the causes of reported problem and make fixes to prevent them from occurring. With this right approach, we don't miss opportunities to fix problems by catching errors in the dark.

DLalot commented 1 year ago

In fact, with the patch, sympa is sending an alert to the listmaster, So it's always possible to analyse the problem

Sympa n'a pas pu archiver le message '1672985091.1672985097.211337.communication@xx.fr,1389,8655'. Celui-ci est déplacé dans le répertoire '/home/sympa/spool/outgoing/bad'. Consultez les logs pour plus de détails.

I believe it's safer to catch the error. 400 mails received for listmaster in less than one hour. Imagine a full week-end :-( And from my point of view, it's just the archive and not so important. As you can see, that's junk emails

ikedas commented 1 year ago

If you think that the fixes I have submitted are not sufficient, please provide a message that will allow us to crash archived.pl even after applying my fixes. Until someone can do that, we will have to assume that the reported problem has been fixed, won't we?

And if archived.pl crashes due to another cause in the future, we should investigate the cause again and fix it again. --- Conversely, if not letting it crash is more important than eliminating the cause of the crash, why not just catch the errors in the main loop of archived.pl, along with the other Sympa services? (This is ironic, of course.)

ikedas commented 1 year ago

Hi @DLalot ,

When I fixed it so that it would not crash, I found a few bugs that I fixed as well. (The essential parts of the fixes have also been submitted as a pull request to HTML-StripScripts, but I am not sure when they will be merged).

Anyway, could you please apply this patch and check if it will solve the problem?

DLalot commented 1 year ago

I applied your patch in prod. Archived is still working :-) and also run your test code Seems to be ok Thanks

export PERL5LIB=/home/sympa/bin perl HTMLSanitizer.t ok 1 - Avoid ReDoS with style attribute ok 2 - Scrub style attribute ok 3 - filter long URI ok 4 - not filter cid URI ok 5 - filter data URI ok 6 - not filter relative URI reference ok 7 - filter URI with empty host ok 8 - not filter https URI with the same origin ok 9 - filter https URI with the other origin 1..9

ikedas commented 1 year ago

Thanks for confirmation @DLalot !