rspamd / rspamd

Rapid spam filtering system.
Other
2.07k stars 383 forks source link

[BUG] HTML parts with no tags/spaces are not parsed properly #3611

Open nafetsreuab opened 3 years ago

nafetsreuab commented 3 years ago

Text in base64-decoded mime-part is not encoded and hence not checked by multimap at all. Supply following DATA content and rspamd is not detecting the base64 content.

2.6-156~buster

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=test

This is a multi-part message in MIME format.

--test
Content-Type: text/plain

This is the body of the message.
--test
Content-Type: text/html
Content-Transfer-Encoding: base64

bWFsaWNpb3VzX2NvbnRlbnQK
 --test--

bWFsaWNpb3VzX2NvbnRlbnQK is the string malicious_content

blocked_strings.map contains: /malicious_content/

malicious_content {
  type = "content";
  filter = "text";
  map = "${LOCAL_CONFDIR}/local.d/blocked_strings.map";
  symbol = "malicious_content";
  regexp = true;
}

No symbol is set. Symbol malicious_content should be set.

2021-01-20 13:44:53 #26722(normal) <35bb93>; task; rspamd_task_write_log: id: , qid: , ip: 1.1.1.1, from: me@here.tld, (default: F (no action): [5.77/500.00] [MISSING_MID(2.50){},HFILTER_URL_ONLY(1.46){0.66666666666667;},CTYPE_MIXED_BOGUS(1.00){},MISSING_DATE(1.00){},NEURAL_HAM_SHORT(-0.59){-0.994;},FORGED_SENDER(0.30){me@here2.tld;me@here2.tld;},DMARC_POLICY_SOFTFAIL(0.10){here.tld: No valid SPF, No valid DKIM;none;},MIME_BASE64_TEXT(0.10){},MIME_GOOD(-0.10){multipart/mixed;text/plain;},ASN(0.00){asn:42730, ipnet:1.1.0.0/19, country:DE;},FROM_NEQ_ENVFROM(0.00){me@here2.tld;me@here2.tld;},MIME_TRACE(0.00){0:+;1:+;2:~;},RCVD_COUNT_ZERO(0.00){0;},R_DKIM_NA(0.00){},TO_DN_ALL(0.00){},BAYES_HAM(-0.00){29.65%;},ARC_NA(0.00){},FROM_HAS_DN(0.00){},RCPT_COUNT_ONE(0.00){1;},R_SPF_NA(0.00){no SPF record;},TO_MATCH_ENVRCPT_ALL(0.00){},WL_ST_HOSTS(0.00){1.1.1.1;}]), len: 514, time: 412.048ms, dns req: 18, digest: <4b12537d8493a1f1179148b4495723ec>, rcpts: me@here2.tld, mime_rcpts: me@here2.tld

vstakhov commented 3 years ago

You clearly do something wrong and make wrong assumptions based on that. Rspamd clearly decodes base64 and I see no issues with it. Furthermore, I cannot reproduce your issue at all: I just get Symbol: malicious_content (0.00) when testing your message.

nafetsreuab commented 3 years ago

I should have not changed the regex when reporting. The real regex and the real base64-string is:

/this\\is\\a\\very\\long\\string/

The base64-string is:

stefan@t540:~$ echo -n "this\is\a\very\long\string" | base64
dGhpc1xpc1xhXHZlcnlcbG9uZ1xzdHJpbmc=

mail.eml:

Subject: base64test
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=test

This is a multi-part message in MIME format.

--test
Content-Type: text/plain

This is the body of the message.
--test
Content-Type: text/html
Content-Transfer-Encoding: base64

dGhpc1xpc1xhXHZlcnlcbG9uZ1xzdHJpbmc=
--test--

This mail does not trigger the SYMBOL with 'rspamc mail.eml'

Adding the string itself also to the e-mail (not base64 encoded), does trigger the SYMBOL:

Subject: base64test
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=test

This is a multi-part message in MIME format.

this\is\a\very\long\string

--test
Content-Type: text/plain

This is the body of the message.
--test
Content-Type: text/html
Content-Transfer-Encoding: base64

dGhpc1xpc1xhXHZlcnlcbG9uZ1xzdHJpbmc=
--test--
vstakhov commented 3 years ago

Please read about shell \ escaping first. Especially in double quotes.

nafetsreuab commented 3 years ago

I'm aware that a backslash in regex needs to be escaped with another backslash. Thats what i did. Decoding the base64 reverse, shows the correct string again. The base64 value is always the same, no matter if in single or double quotes.

stefan@t50:~$ echo "dGhpc1xpc1xhXHZlcnlcbG9uZ1xzdHJpbmc=" | base64 -d
this\is\a\very\long\string

I do not see what I'm doing wrong :/

vstakhov commented 3 years ago

Ok, it is not properly parsed by HTML parser in Rspamd. Tbh, I'm a bit tired of all those sorts of shit users push into their HTML.

vstakhov commented 3 years ago

And the only reason why it is not parsed is lack of newline or space in this content.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bauerstefan commented 3 years ago

still important.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nafetsreuab commented 3 years ago

still interesting. i dont like stale bot.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bauerstefan commented 2 years ago

still important. bad stale-bot!