mailcow / mailcow-dockerized-docs

mailcow: dockerized - documentation 📰
https://docs.mailcow.email
87 stars 218 forks source link

Document all remote resources / "phoning home" situations #252

Closed ValdikSS closed 2 years ago

ValdikSS commented 3 years ago

I'm not very familiar with modern email suites. I was aware of DNSBL, but was surprised to find that, for example, rspamd is constantly downloading fuzzy data from its and Mailcow servers. It seems I'm not the only one: @immanuelfodor in his post wrote:

Rspamd accessing an external Internet resource in every 5-7s seems fairly bogus to me, not to mention the possible load on the destination site originating from all other Mailcow instances if it's not only mine (DoS).

@andryyy considered that statement ridiculous and toxic, however I find it totally valid: I'd ask the same question because I'm not aware of how often should DNSBL queries perform and what amount of data to expect from it and rspamd, and would think that something misbehaves.

I'd like to add all external resources which Mailcow use in the documentation, its type, query frequency and estimated amount of data transfer. Right now I found the following:

  1. RBL/SURBL/URIBL are listed in default rspamd configuration and added by mailcow configuration. Rspamd refreshes the list of whitelisted domains from https://maps.rspamd.com/rspamd/surbl-whitelist.inc.zst
  2. Phishing lists are loaded in default rspamd configuration: (https://www.openphish.com/feed.txt, https://maps.rspamd.com/rspamd/redirectors.inc.zst). Phishtank module is disabled in maincow configuration.
  3. Message ID lists for some domains are updated from https://maps.rspamd.com/rspamd/mid.inc.zst [reference]
  4. ASN lookups are performed via asn.rspamd.com, asn6.rspamd.com (rspamd config, mailcow config)
  5. Abuse URL maps are downloaded from https://urlhaus.abuse.ch/downloads/text_online/, ttps://bazaar.abuse.ch/export/txt/md5/recent/ [maincow config]
  6. Rspamd fuzzy servers: default rspamd server (uzzy1.rspamd.com:11335,fuzzy2.rspamd.com:11335) and Mailcow server (fuzzy.mailcow.email:11445)

Is this list correct and full, did I miss anything? If everything seems right, I'll prepare documentation update.

andryyy commented 3 years ago

We are not downloading data from fuzzy servers. Thanks.

Am 10.01.2021 um 20:32 schrieb ValdikSS notifications@github.com:

 I'm not very familiar with modern email suites. I was aware of DNSBL, but was surprised to find that, for example, rspamd is constantly downloading fuzzy data from its and Mailcow servers. It seems I'm not the only one: @immanuelfodor in his post wrote:

Rspamd accessing an external Internet resource in every 5-7s seems fairly bogus to me, not to mention the possible load on the destination site originating from all other Mailcow instances if it's not only mine (DoS).

@andryyy considered that statement ridiculous and toxic, however I find it totally valid: I'd ask the same question because I'm not aware of how often should DNSBL queries perform and what amount of data to expect from it and rspamd, and would think that something misbehaves.

I'd like to add all external resources which Mailcow use in the documentation, its type, query frequency and estimated amount of data transfer. Right now I found the following:

RBL/SURBL/URIBL are listed in default rspamd configuration and added by mailcow configuration. Rspamd refreshes the list of whitelisted domains from https://maps.rspamd.com/rspamd/surbl-whitelist.inc.zst Phishing lists are loaded in default rspamd configuration: (https://www.openphish.com/feed.txt, https://maps.rspamd.com/rspamd/redirectors.inc.zst). Phishtank module is disabled in maincow configuration. Message ID lists for some domains are updated from https://maps.rspamd.com/rspamd/mid.inc.zst [reference] ASN lookups are performed via asn.rspamd.com, asn6.rspamd.com (rspamd config, mailcow config) Abuse URL maps are downloaded from https://urlhaus.abuse.ch/downloads/text_online/, ttps://bazaar.abuse.ch/export/txt/md5/recent/ [maincow config] Rspamd fuzzy servers: default rspamd server (uzzy1.rspamd.com:11335,fuzzy2.rspamd.com:11335) and Mailcow server (fuzzy.mailcow.email:11445) Is this list correct and full, did I miss anything? If everything seems right, I'll prepare documentation update.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

ValdikSS commented 3 years ago

We are not downloading data from fuzzy servers. Thanks.

Could you clarify what do you mean by that? Mailcow configuration file contains fuzzy.mailcow.email:11445 server, there's also news reference on Mailcow website:

https://mailcow.email/2020/02/27/1st-fuzzy-storage-is-online-2nd-spam-wanted/

The fuzzy storage is now enabled in mailcow, so please update your cows.

andryyy commented 3 years ago

It does not download data. You send a hash generated with a specific algorithm to the fuzzy servers and we check it for a match. :)

ValdikSS commented 3 years ago

Noted, thanks.

immanuelfodor commented 3 years ago

It seemed to me that andryyy was in a hurry or is used to questions from people fiddling with their first ever Raspberry Pi. I'm not angry or anything, I was just sad that the thread was locked and that I couldn't point out even more, that the last screenshot contained TCP traffic on port 443 on the firewall that is not DNS over UDP 53 but usually HTTPS. Since mailcow uses it's own DNS server with not much visibility other than maybe logs, I think my discovery was unwelcomed as off the beaten track DNS setup. Although Unbound caches the DNS (just as my PiHole), that was HTTPS traffic that I observed on my firewall. The invaluement list contained text data that needed to be downloaded for parsing in an Rspamd map.

Since then, I lowered the config frequency at that moment when andryyy suggested it, but it did not made it go away, and then an update changed this behavior, so I can no more see the invaluement queries at the top of the query list. I'm happy that it normalized thanks to Mailcow or Rspamd developers, I don't know who gets the trophy for the fix. I'm sure that Mailcow is not siphoning data, and andryyy is a gem for Mailcow for doing so much work for the project. Thank you for that with all my heart! It's just that these quick and hasty comments are leaving a bad taste in the people who have some valid concerns but maybe don't know the internals of all the integrated tools :)

immanuelfodor commented 3 years ago

(I wrote you an email to the address on your profile. Thanks for https://github.com/mailcow/mailcow-dockerized/issues/3929)

What I did is to lower the watch interval here:

cat ~/mailcow/data/conf/rspamd/local.d/options.inc   
dns {                                                                             
   enable_dnssec = true;                                                            
}                                                                                  
#map_watch_interval = 30s;                                                         
map_watch_interval = 15min;                                                        
dns {                                                                              
  timeout = 4s;                                                                    
  retransmits = 2;                                                                 
}                                                                                  
disable_monitoring = true;

It's still in the logs but not that frequently, only 66 queries in the last 24h:

Screenshot_20210111-060923

You can enable Rspamd verbose logging as the following:

cat ~/mailcow/data/conf/rspamd/override.d/logging.custom.inc                                                                            
# @see: https://rspamd.com/doc/configuration/logging.html                          
# @see: https://forums.zimbra.org/viewtopic.php?t=62443                            
# @see: https://github.com/mailcow/mailcow-dockerized/issues/3877                  
debug_modules = ["dns", "rbl", "map"]

It seemed to me that lowering the watch interval is not always resulting in decrease of the connection frequency, or at least not quite observable with the human eye. It seems to me that there is a multiplier or something like that what takes the interval as base value, then multiplies it with a fractional value like 30s * 0.2 = 6s. So when you increase the interval config, the end result is just fractional improvement, that's why I needed to increase it so much from 30s to 15m. That resulted in about 2700 queries per day instead of the original 15000+ as far as I remember. Then an update solved this for me resulting in the above screenshot, 66 queries in the last day. There is definitely HTTPS activity to invaluement to grab the list for the map, it just not blows up the DNS and firewall logs anymore.

One thing more, I'm also sorry for that issue title if it hurt you andryyy. I saw this phrase in another ticket in another project for describing a huge difference from the baseline, and it instantly came to my mind when I saw that invaluement is blowing up the logs. I didn't want to hurt your feelings or anything, sorry for that "ridiculous amount"!

andryyy commented 3 years ago

It is TCP data indeed. But it just does a header lookup and check the last modification of that map. Reducing the interval also lowers the refresh time for settings map, keep that in mind. :)

And thank you, Immanuel, for your kind words.

Am 11.01.2021 um 05:45 schrieb Immánuel! notifications@github.com:

 It seemed to me that andryyy was in a hurry or is used to questions from people fiddling with their first ever Raspberry Pi. I'm not angry or anything, I was just sad that the thread was locked and that I couldn't point out even more, that the last screenshot contained TCP traffic on port 443 on the firewall that is not DNS over UDP 53 but usually HTTPS. Since mailcow uses it's own DNS server with not much visibility other than maybe logs, I think my discovery was unwelcomed as off the beaten track DNS setup. Although Unbound caches the DNS (just as my PiHole), that was HTTPS traffic that I observed on my firewall. The invaluement list contained text data that needed to be downloaded for parsing in an Rspamd map.

Since then, I lowered the config frequency at that moment when andryyy suggested it, but it did not made it go away, and then an update changed this behavior, so I can no more see the invaluement queries at the top of the query list. I'm happy that it normalized thanks to Mailcow or Rspamd developers, I don't know who gets the trophy for the fix. I'm sure that Mailcow is not siphoning data, and andryyy is a gem for Mailcow for doing so much work for the project. Thank you for that with all my heart! It's just that these quick and hasty comments are leaving a bad taste in the people who have some valid concerns but maybe don't know the internals of all the integrated tools :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

immanuelfodor commented 3 years ago

In the meantime, I read the other conversation in the above linked ticket, and it was explained there also that these are just HTTP header lookups. It definitely explains the HTTPS traffic we all see. It's also wow to see the Invaluement CEO involved, and although these are header lookups, it must be hammering them from all over the world.

I do not want to derail this ticket anymore which is originally about documentation, I'll also follow the other one from now on which is about the connections themselves :) Thank you!

seniorm0ment commented 3 years ago

I think a page that clarifies/documents this content, as well as other related privacy and security content/concerns/suggestions/options/mitigations would be beneficial.

This page could also include external reading resources.