mail-in-a-box / mailinabox

Mail-in-a-Box helps individuals take back control of their email by defining a one-click, easy-to-deploy SMTP+everything else server: a mail server in a box.
https://mailinabox.email/
Creative Commons Zero v1.0 Universal
13.99k stars 1.44k forks source link

FR: train spam filter #1933

Closed steadfasterX closed 3 years ago

steadfasterX commented 3 years ago

MIAB is doing a great job to avoid a lot of spam in the first place but .. there is always a chance something moved into the spam folder which shouldn't.

I do not want to identify the reasons WHY that happens (we all know that the reason is on sender site) - it happens and I want to train the spam filter in a way that this does not happen again for that type of mail.

I stumbled over these on my search:

so it seems there is no click & get - way of achieving this. While searching further I found:

which has several options to achieve these kind of training. Unfortunately I do not have the time going deeper into each option. ISBG is maybe an option but I am not sure I understand how that works. Is that something which maybe could be integrated in MIAB itself? It sounds like an integration of the procmail approach might do as well ?!

Another option could be adding an UI option in MIAB which manages (add/edit/delete) this whitelist change: https://github.com/mail-in-a-box/mailinabox/issues/1192

I am sorry that I cannot help more on that topic but I would greatly appreciate if someone else does. I think training the spam folder is something everyone expects to just work in MIAB (at least I do).

one last thing: MIAB is really saving my day/weeks/months.. a great piece of OSS and I really appreciate all the efforts taken so far!

myfirstnameispaul commented 3 years ago

This is discussed in many places on a somewhat regular basis, and I tend to agree that MiaB doesn't seem to learn as well as one might expect.

However, the consensus tends to be the same as you express here that nobody has the time to dig far enough to discover why this is the case and how it might be improved.

Until someone gets their arms completely around the issue, I don't think it will be resolved.

FWIW, if you add a uniquely named file to /etc/spamassassin/, spamassassin will process it automatically, but no update will touch it, at least in my experience.

jvolkenant commented 3 years ago

Spam should be learned when it gets moved in and out of the spam folder, obviously there are a lot of spam emails needed to help train spamassassin.

https://github.com/mail-in-a-box/mailinabox/blob/c7280055a83085b3d3efd5a9296a1bea4923315c/setup/spamassassin.sh#L153-L165

I ran this script when I moved from gmail to MIAB and only run it from time to time when I want to force a spam learn, but really, I've found it unnecessary to run it to get any changed result. I do it just to have that feel good feeling inside. A cron job could be done if you really wanted to.

root@m:~# cat force-spam-learn.sh
#!/bin/bash
echo "Learning Spam from Spam folders"
sa-learn --spam --progress /home/user-data/mail/mailboxes/*/*/.Spam/cur/

#echo "Learning Ham from Inbox folder"
#sa-learn --ham --progress /home/user-data/mail/mailboxes/*/*/cur/
sa-learn --dump magic --dbpath /home/user-data/mail/spamassassin/
steadfasterX commented 3 years ago

Spam should be learned when it gets moved in and out of the spam folder, obviously there are a lot of spam emails needed to help train spamassassin.

https://github.com/mail-in-a-box/mailinabox/blob/c7280055a83085b3d3efd5a9296a1bea4923315c/setup/spamassassin.sh#L153-L165

I ran this script when I moved from gmail to MIAB and only run it from time to time when I want to force a spam learn, but really, I've found it unnecessary to run it to get any changed result. I do it just to have that feel good feeling inside. A cron job could be done if you really wanted to.

root@m:~# cat force-spam-learn.sh
#!/bin/bash
echo "Learning Spam from Spam folders"
sa-learn --spam --progress /home/user-data/mail/mailboxes/*/*/.Spam/cur/

#echo "Learning Ham from Inbox folder"
#sa-learn --ham --progress /home/user-data/mail/mailboxes/*/*/cur/
sa-learn --dump magic --dbpath /home/user-data/mail/spamassassin/

so you saying it should just work already without changing anything?

For example I get 2 messages per day which are identically from its headers and most of the content (backup job) but even though I move them out of the junk folder since 2 weeks they are still detected as spam. I added a whitelist as my feeling was this will never get trained - a wrong feeling maybe?!

jvolkenant commented 3 years ago

Yes, the /etc/dovecot/conf.d/99-local-spampd.conf config should be learning spam/ham as it goes in and out of the spam folder. spamassassin is not a silver bullet, legit mail can still be detected as spam for a lot of reasons. It is best to check the headers for those 2 mails to see why it's being classified as spam. Any total value over 5 will get put into the spam folder.

In Roundcube, select an email, click "..." at the top and then show source. There should be a few X-Spam-* entries with some values. You can post those here and we can take a look.

I as well have some internal mail I send from home servers that gets classed as spam. I do the same thing and created sieve rules to put them places, so it's not unheard of.

myfirstnameispaul commented 3 years ago

I don't know if this is a good idea or not and I've no idea how much development work would be required in comparison to solving the problem properly, but what about just adding to a dashboard page an option to place either an email address or a domain name to a /etc/spamassin/ whitelist file?

I do think that the "manually add a conf file to your server" solution is outside the bounds of the intent of MiaB as a project, and it is a problem I think most of us noticed after using MiaB for less than a week, at least in comparison to popular freemail services, so it's likely something most users are experiencing.

jvolkenant commented 3 years ago

I don't know if this is a good idea or not and I've no idea how much development work would be required in comparison to solving the problem properly, but what about just adding to a dashboard page an option to place either an email address or a domain name to a /etc/spamassin/ whitelist file?

There is a reason why mail goes into the spam bin, fixing those reasons is correct and not one off whitelists on everyone's installs. i.e. if someone doesn't specify SPF records and it lands on my spam, yea, I could whitelist it, but the sender should really fix their SPF.

I do think that the "manually add a conf file to your server" solution is outside the bounds of the intent of MiaB as a project, and it is a problem I think most of us noticed after using MiaB for less than a week, at least in comparison to popular freemail services, so it's likely something most users are experiencing.

You are correct, that users shouldn't manually add conf files to their servers. It is not supported and could result in more problems than it fixes. I suggested headers be posted so we could see what spamassassin is reporting for those 2 mails. But I don't think I would agree that its likely that most users are experiencing this sort of problem.

steadfasterX commented 3 years ago

I don't know if this is a good idea or not and I've no idea how much development work would be required in comparison to solving the problem properly, but what about just adding to a dashboard page an option to place either an email address or a domain name to a /etc/spamassin/ whitelist file?

I do think that the "manually add a conf file to your server" solution is outside the bounds of the intent of MiaB as a project, and it is a problem I think most of us noticed after using MiaB for less than a week, at least in comparison to popular freemail services, so it's likely something most users are experiencing.

Yes I had that idea as well, see :

image

steadfasterX commented 3 years ago

There is a reason why mail goes into the spam bin, fixing those reasons is correct and not one off whitelists on everyone's installs. i.e. if someone doesn't specify SPF records and it lands on my spam, yea, I could whitelist it, but the sender should really fix their SPF.

well yea .. in an ideal world it would work like that: I tell the sender to fix their issues.. but the truth is this does not work like that.. I mean yea for internal mails and those I can manage by myself I can fix it but not for others when they send out newsletters, blog updates, etc. They usually do not care (from my experience). there is a reason why every spam-system out there allowing whitelisting and training. I dont think that the ppl who are using MIAB are representing the average user out there and thats why those always find another solution for these kind of issues (usually whitelisting from what I read everywhere). imho that is the reason why spam training / false positives are not a big issue here on the tracker.

You are correct, that users shouldn't manually add conf files to their servers. It is not supported and could result in more problems than it fixes. I suggested headers be posted so we could see what spamassassin is reporting for those 2 mails. But I don't think I would agree that its likely that most users are experiencing this sort of problem.

imho whitelisting a specific mail address wouldn't cause more problems - what kind of problems? If the management of the whitelisting is done in the UI it would solve issues where training is not enough / sufficient.

Yes, the /etc/dovecot/conf.d/99-local-spampd.conf config should be learning spam/ham as it goes in and out of the spam folder. spamassassin is not a silver bullet, legit mail can still be detected as spam for a lot of reasons. It is best to check the headers for those 2 mails to see why it's being classified as spam. Any total value over 5 will get put into the spam folder.

ok then! I really thought there is nothing within MIAB which is doing spam training! Many thanks for your help on that topic. I will close that issue now as my initial assumption was wrong.