zokradonh / kopano-docker

Unofficial Kopano Docker images for all Kopano services.
MIT License
59 stars 36 forks source link

Spam learn #481

Closed Schmide-github closed 3 years ago

Schmide-github commented 3 years ago

Hi, is this a bug?

2021-06-11T10:09:27.741: [ERROR ] spamd - Traceback (most recent call last): File "/usr/lib/python3/dist-packages/kopano/log.py", line 138, in log_exc try: yield File "/usr/lib/python3/dist-packages/kopano_spamd/__init__.py", line 81, in update self.learn(item, searchkey, True) File "/usr/lib/python3/dist-packages/kopano_spamd/__init__.py", line 103, in learn gid = grp.getgrnam(self.sagroup).gr_gid KeyError: "getgrnam(): name not found: 'amavis'"

the database is not growing

Thanks

fbartels commented 3 years ago

@Schmide-github please fill out the issue template

Schmide-github commented 3 years ago

Hello fbartels, do you mean the following.

Describe the bug If mails are pushed into the Junk folder, they are not learned. The following error occurs in the Docker log:

Notice: Container is run read-only, skipping package installation. If you want to have additional packages installed in the container either:

To Reproduce

when running manually in the Docker container kopano_spamd, the error occurs:

root@2c2b533dfbab:/kopano/path# kopano-spamd -l DEBUG 2021-06-11T11:02:12.808: [INFO ] spamd - starting spamd 2021-06-11T11:08:15.784: [INFO ] spamd - Learning message as SPAM, entryid: 0000000027237074396243CB89AC4950448269BE0100000005000000770AB7524E6D48538BBEB99D83BC36EF00000000 2021-06-11T11:08:15.795: [ERROR ] spamd - Traceback (most recent call last): File "/usr/lib/python3/dist-packages/kopano/log.py", line 138, in log_exc try: yield File "/usr/lib/python3/dist-packages/kopano_spamd/init.py", line 81, in update self.learn(item, searchkey, True) File "/usr/lib/python3/dist-packages/kopano_spamd/init.py", line 103, in learn gid = grp.getgrnam(self.sagroup).gr_gid KeyError: "getgrnam(): name not found: 'amavis'"

2021-06-11T11:08:15.826: [INFO ] spamd - Learning message as SPAM, entryid: 0000000027237074396243CB89AC4950448269BE01000000050000009057E2BB77B140098D047500F36927AB00000000 2021-06-11T11:08:15.838: [ERROR ] spamd - Traceback (most recent call last): File "/usr/lib/python3/dist-packages/kopano/log.py", line 138, in log_exc try: yield File "/usr/lib/python3/dist-packages/kopano_spamd/init.py", line 81, in update self.learn(item, searchkey, True) File "/usr/lib/python3/dist-packages/kopano_spamd/init.py", line 103, in learn gid = grp.getgrnam(self.sagroup).gr_gid KeyError: "getgrnam(): name not found: 'amavis'"

Expected behavior learn the spam and delete it if necessary

reneploetz commented 3 years ago

Please try to edit kopano_spamd.env to set the user for the spamd container like so:

KCCONF_SPAMD_SA_GROUP=kopano

This should set the group to kopano in /tmp/kopano/spamd.cfg

Note that the spamd container will only dump mails to /var/lib/kopano/spamd and /var/lib/kopano/ham (the latter if you move mails from junk to inbox only). You will need to setup an automated task that runs the learning process on a regular basis.

You can do this in any way you want, but my current setup looks like this: I mount the kopanospamd volume into the mail container as there is a spamassassin installation which already has all the preconfigured paths for us:

services:
  mail:
   ...
    volumes:
      - kopanospamd/:/var/lib/kopano/spamd

Additionally, I mounted a cron script named sa-learn to /etc/cron.d/sa-learn to run this on a daily basis (in the morning):

0  4 * * * root  sa-learn --spam /var/lib/kopano/spamd/spam --dbpath /var/mail-state/lib-amavis/.spamassassin
15 4 * * * root  sa-learn --ham /var/lib/kopano/spamd/ham --dbpath /var/mail-state/lib-amavis/.spamassassin

It might be possible to use amavis instead of root for security reasons, tough I did not test this.

Note that I you also need to learn some of your non-spam mails too, otherwise spamassassin will ignore your new database. I did extract some of them into /var/lib/kopano/spamd/ham and allow the script to update them if I put in new mails every once in a while.

@fbartels I can provide a pull request for some or all of the changes. Please advise me how integrated such a solution should be.

fbartels commented 3 years ago

I agree with the assessment of @reneploetz, the issue at hand seems to be that the default user does not exist and therefore the code fails. It would probably make sense to set this to the kopano user default through docker-compose.yml

I can provide a pull request for some or all of the changes. Please advise me how integrated such a solution should be.

I must say I have not really played with spamd after it has been contributed back in the past. Thinking of it in general I think its probably best to move the spamd container to the mail specific yaml file or maybe it along with the changes to auto learn with the mail container should be a dedicated yaml file?

What are your thoughts?

reneploetz commented 3 years ago

I thought a bit about it and I think it's best to do this as an extra as it requires manual work to get at least 200 ham mails so that the bayes classifier will actually be used (you also need 200 spam messages). So in its current state and without further configuration it will essentially fill up disk space in the kopanospamd volume for mails that are moved to the junk folder.

Therefore I would suggest the following:

This way, we do not need to modify the existing mail container image and users can add this to their COMPOSE_FILE variable in .env after reading the documentation the same way we do this with ldap-extras right now. This also allows users that do not care or know about this functionality to properly set it up or ignore it altogether.

Example file with limited testing:

version: "3.5"

services:
  kopano_spamd:
    image: ${docker_repo:-zokradonh}/kopano_core:${CORE_VERSION:-latest}
    read_only: true
    restart: unless-stopped
    container_name: ${COMPOSE_PROJECT_NAME}_spamd
    depends_on:
      - kopano_server
    volumes:
      - /etc/machine-id:/etc/machine-id
      - /etc/machine-id:/var/lib/dbus/machine-id
      - kopanosocket/:/run/kopano
      - kopanossl/:/kopano/ssl
      - kopanospamd/:/var/lib/kopano/spamd
    environment:
      - SERVICE_TO_START=spamd
      - TZ=${TZ}
      - KCCONF_SPAMD_SA_GROUP=kopano
    networks:
      - kopano-net
    tmpfs:
      - /tmp

  mail:
    volumes:
      - kopanospamd/:/var/lib/kopano/spamd

  kopano_scheduler:
    environment:
      - CRONDELAYED_LEARN_HAM=0 4 * * * docker exec kopano_mail sa-learn --spam /var/lib/kopano/spamd/spam --dbpath /var/mail-state/lib-amavis/.spamassassin
      - CRONDELAYED_LEARN_SPAM=15 4 * * * docker exec kopano_mail sa-learn --ham /var/lib/kopano/spamd/ham --dbpath /var/mail-state/lib-amavis/.spamassassin

volumes:
  kopanospamd:
fbartels commented 3 years ago

That sounds perfect @reneploetz