Closed dhpiggott closed 10 years ago
I change the owner of the SA database directory to spampd:
david@elm:~⟫ sudo chown -R spampd:spampd /home/user-data/mail/spamassassin
david@elm:~⟫ ls -la !$
ls -la /home/user-data/mail/spamassassin
total 6116
drwxrwxr-x 2 spampd spampd 4096 Oct 11 16:44 .
drwxrwxr-x 8 root www-data 4096 Oct 4 16:52 ..
-rwxrwxr-x 1 spampd spampd 2670592 Oct 11 16:44 bayes_seen
-rwxrwxr-x 1 spampd spampd 5025792 Oct 11 16:44 bayes_toks
I observe the SpamAssassin global statistics again and find that nham has incremented:
david@elm:~⟫ sa-learn --dump magic --dbpath /home/user-data/mail/spamassassin/
0.000 0 3 0 non-token data: bayes db version
0.000 0 805 0 non-token data: nspam
0.000 0 28977 0 non-token data: nham
0.000 0 125604 0 non-token data: ntokens
0.000 0 1390477427 0 non-token data: oldest atime
0.000 0 1413046040 0 non-token data: newest atime
0.000 0 1413043327 0 non-token data: last journal sync atime
0.000 0 1412674765 0 non-token data: last expiry atime
0.000 0 22118400 0 non-token data: last expire atime delta
0.000 0 1140419 0 non-token data: last expire reduction count
I observe the SA database directory and note that a journal file now exists:
david@elm:~⟫ !-2
ls -la /home/user-data/mail/spamassassin
total 6120
drwxrwxr-x 2 spampd spampd 4096 Oct 11 16:47 .
drwxrwxr-x 8 root www-data 4096 Oct 4 16:52 ..
-rw------- 1 spampd spampd 4608 Oct 11 16:47 bayes_journal
-rwxrwxr-x 1 spampd spampd 2670592 Oct 11 16:47 bayes_seen
-rwxrwxr-x 1 spampd spampd 5025792 Oct 11 16:47 bayes_toks
I look at the logs and see success there too:
Oct 11 16:47:20 elm postfix/smtpd[11008]: disconnect from mail-lb0-x236.google.com[2a00:1450:4010:c04::236]
Oct 11 16:47:20 elm spampd[9368]: processing message <CAKHNFFfX-daQBX212YXO7aijyOTS9ziNQg9m+Ve-QLE07sec1w@mail.gmail.com> for <david@piggott.me.uk>
Oct 11 16:47:20 elm spampd[9368]: clean message <CAKHNFFfX-daQBX212YXO7aijyOTS9ziNQg9m+Ve-QLE07sec1w@mail.gmail.com> (-1.49/5.00) from <dhpiggott@gmail.com> for <david@piggott.me.uk> in 0.04s, 2043 bytes.
So at least on the surface it would seem that setup/spamassasin.sh
should be changed to make spampd the owner, not mail.
Nack, I was a little too hasty there, sorry! I just found that though the change fixes live training for incoming mail, it breaks it for sieve training when manually moving a received email between IMAP folders. There's no error, but the SA stats don't change; both mail and spampd need write permissions.
Can you reopen this ticket or should I open another?
Thanks.
As I've defined an alias for root@primary-hostname that forwards to my real account, I get emailed the output of cron jobs. I also note the following warning for the daily spamassasin update:
...
/etc/cron.daily/spamassassin:
Oct 11 06:55:30.890 [19066] warn: bayes: cannot write to /home/user-data/mail/spamassassin/bayes_journal, bayes db update ignored: Permission denied
bayes: cannot write to /home/user-data/mail/spamassassin/bayes_journal, bayes db update ignored: Permission denied
Inspection of /etc/cron.daily/spamassassin
shows that the failing command is (I think) su - debian-spamd -c "sa-update --gpghomedir /var/lib/spamassassin/sa-update-keys"
, i.e. it's running as user debian-spamd (even if it's not that command failing, many/most are run as debian-spamd).
So I think the fix for this will be to make the spamassasin directory owned and writeable by a group which has debian-spampd, mail and spampd as members. I'm just not sure which group it should be - I know any will work, I'm just uncertain about any security implications.
Do you have any thoughts on whether it should be owned by mail/spampd/something else (and therefore which group should have all those users as members)? I'm leaning toward spampd.
Yikes so many groups!
I wonder also how the permissions will get set when the files are first created.
I can't think of a reason to set up the group one way or another.
I just tried to check how the files first get created by running a Vagrant deploy and using test_mail.py
against it. It seems they don't. It may be that the only reason they exist on my actual deployment is because I manually ran sa-learn to train against my imported Maildir.
vagrant@mailinabox:/home/user-data/mail/spamassassin$ ls -la
total 8
drwxrwxr-x 2 spampd spampd 4096 Oct 11 18:14 .
drwxrwxr-x 8 root www-data 4096 Oct 11 18:19 ..
Despite no logged errors:
Oct 11 18:28:03 mailinabox postfix/smtps/smtpd[4331]: disconnect from unknown[192.168.50.1]
Oct 11 18:28:03 mailinabox dovecot: lmtp(4335): Connect from 127.0.0.1
Oct 11 18:28:03 mailinabox spampd[21464]: processing message (unknown) for <me@95aad.justtesting.email>
Oct 11 18:28:03 mailinabox spampd[21464]: clean message (unknown) (2.30/5.00) from <me@95aad.justtesting.email> for <me@95aad.justtesting.email> in 0.04s, 905 bytes.
I went ahead and manually trained against the empty Spam maildirs to confirm that running sa-learn does create the files:
vagrant@mailinabox:/home/user-data/mail/spamassassin$ sudo sa-learn --spam /home/user-data/mail/mailboxes/*/*/.Spam/{cur,new}/
Learned tokens from 0 message(s) (0 message(s) examined)
vagrant@mailinabox:/home/user-data/mail/spamassassin$ ls -la
total 24
drwxrwxr-x 2 spampd spampd 4096 Oct 11 18:33 .
drwxrwxr-x 8 root www-data 4096 Oct 11 18:19 ..
-rw------- 1 root root 12288 Oct 11 18:32 bayes_seen
-rw------- 1 root root 12288 Oct 11 18:33 bayes_toks
I think the fix will now be to do something like:
setup/spamassassin.sh
.setup/spamassassin.sh
, restart spampd.But before I make these changes this I think I should read up about SpamAssassin a bit more and look at examples of other configurations to check this is the best way (before I switched to mailinbox I was using DSPAM but I never really understood it anyway).
Until then, I don't see any need to revert the change you've already merged though as I don't think sieve-training not working is any worse than receiving-training not working.
Let's try to keep this simple. 1-4 should be enough. In place of 4, sa-learn.sh could be modified to explicitly set better permissions on the generated files. [edit: not recommending this specifically, just mentioning it]
Thanks.
I'll certainly try to keep it simple.
I resumed looking at this, nerd-sniped myself into looking at a bunch of Postfix/SpamAssassin docs, distilled it down to a few hopefully-relevant tabs, and then ran out of time. I'm just posting this comment as my notes for when I next work on this and/or for anyone else interested.
Useful references:
Notes to self:
conf/sieve-spam.txt
, has setflag "\\Seen"
. In my old DSPAM setup I deliberately didn't do this so that I would notice the presence of new spam from the unread count in my IMAP client without having to actually open the folder to check. I seem to recall that Gmail does the same thing (leaves spam as unread), and from a usability point of view I find Gmail to be a good model.Hey,
I want to try to get this wrapped up so I can push another release, so I dug into it a bit. I couldn't get it to work either adding spampd to the mail group or vice versa. Adding the group with usermod -G
had no effect. Don't know why modifying the spampd user didn't work. Dovecot sort of explicitly doesn't let you do it but has an option mail_access_groups
that lets you specify other groups to run as. So I added spampd
to that list, and that took care of the spampd process and the sa-learn script.
I haven't been getting errors with debian-spamd (not sure why not) so I didn't try to fix that, since I don't have a way to test if it worked.
832860d79647573f6beeb8871e9d2f21b421dd69 (there's another commit on top of that that reorganizes spamassassin.sh)
Let me know if it works for you?
sorry that's 7ca54a2bfb12179ffbd8d0c00f44efee7d0e5a4e
The changes look good to me - they should definitely be an improvement. I have two minor concerns:
chmod -R
runs it may end up being created later by the spampd process (so as the spampd user) during processing of incoming mail, in which case, would the group permissions allow sieve triggered retraining (as the mail user) to write to it?storage/mail/spamassassin
really need to be unreadable by other users? In the absence of any stats the in web UI (and I'm not suggesting there should be any) I'm using sa-learn --dump magic
to check things are working as they should, and previously I could run it as my non-privileged user - it's just nice to not have to sudo
unnecessarily.Stupid question re. adding spampd to the mail group: did you restart spampd after doing so?
If we can leave this open I'll hopefully resolve the debian-spampd issue soon enough.
I confirm both incoming learning via the LMTP proxy and sieve relearning are now working for me - sa-learn --dump
shows nspam and nham counts change as expected.
The one question I find myself asking now is why any learning needs to be done by the LMTP proxy - why doesn't/can't the sieve script also take care of training on new mail as it's delivered to Dovecot? That'd surely be simpler.
No error output from /etc/cron.daily/spamassassin
this morning, though I don't know why!
I wonder if there is going to be a problem with the journal file.
Hmm. I have never actually seen the journal file. But I get your point. It might also be created by the sa-learn-pipe script and owned by mail
, locking out spampd
.
Does storage/mail/spamassassin really need to be unreadable by other users?
No (I assume all local processes are trusted) but it seemed like a nice thing to do.
did you restart spampd after doing so?
Pretty sure. I know that's necessary for groups. But it's surprising it didn't work so maybe I messed something up.
why any learning needs to be done by the LMTP proxy
I didn't even realize learning was happening then. If we can turn that off then maybe we can re-do this again with the files owned by mail
? (I don't really want to re-do it though.)
On the cron job: I've not seen any further errors so I'm going to let that one go without fully understanding it.
On learning: by adding ADDOPTS="--config=/etc/spampd.conf"
to /etc/default/spampd
and bayes_auto_learn 0
to /etc/spampd.conf
, I have just successfully switched off training within spampd itself, and confirmed that the sieve rule takes care of training when incoming mail is placed in my inbox.
This would simplify the configuration greatly and invalidate those concerns about the journal file. I'm going to have a go at making a change that would redo this - I think it will be worth it, but you can be judge of that if/when I have something to show! It should just amount to reverting four commits and adding one with the two parameters above.
Sounds good.
I take my above comment about learning back. I was mistaken in thinking Dovecot antispam was taking care of learning when I had disabled learning in spampd via spampd.conf
(it involves the pretty stupid mistake of me adding my bayes_auto_learn 0
line as bayes_auto_learn 1
).
When I then actually disabled learning in spampd I found that incoming mail was not fed to sa-learn by Dovecot antispam, and doing some reading I realised Dovecot antispam is already being used for as much as it can be - it is only meant for retraining, so we do have to have spampd handle learning on incoming mail.
As for the journal file, according to http://commons.oreilly.com/wiki/index.php/SpamAssassin/SpamAssassin_as_a_Learning_System, the bayes_learn_to_journal
parameter is disabled by default on SpamAssassin 3.0 (the version provided by Ubuntu 14.04 is 3.4) and I can't see that it has been enabled anywhere, so I don't even know why I was seeing a journal file (looking right now, I don't see one).
In conclusion, I'm happy to close this issue now - the changes you made really do seem to be the best fix for the learning permissions problem - thanks!
Ok thanks again for looking into all this. I guess we got lucky that the fix we ended up with actually was the right approach. :)
I observe the SpamAssassin global statistics:
I observe the SpamAssassin global statistics again and find that nham has not incremented:
I look at the logs and see this:
These are the permissions for the SA database - just as
setup/spamassasin.sh
sets them: