roehling / postsrsd

Postfix Sender Rewriting Scheme daemon
324 stars 39 forks source link

Socketmap Permission denied / Postfix chroot problem? #141

Closed onlime closed 1 year ago

onlime commented 1 year ago

I am trying to get PostSRSd 2.0.2 running on a Debian Bullseye host with Postfix 3.5.17, but can't get it working (PostSRSd 1.12 working fine) with the following Postfix setup (see #76 - Only rewrite sender when forwarding and dynamically exclude local domains):

### main.cf
default_transport = smtp:127.0.0.1:10027
recipient_canonical_maps = socketmap:unix:srs:reverse
recipient_canonical_classes= envelope_recipient,header_recipient

### master.cf
cleanup-srs  unix  n    -       -       -       0       cleanup
  -o syslog_name=postfix-srs
  -o sender_canonical_maps=mysql:/etc/postfix/mysql-virtual_local_email.cf,socketmap:unix:srs:forward
  -o sender_canonical_classes=envelope_sender

127.0.0.1:10027  inet  n  -     -       -       -       smtpd
  -o syslog_name=postfix-srs
  -o smtpd_milters=
  -o cleanup_service_name=cleanup-srs
  -o smtpd_tls_security_level=none
  -o content_filter=smtp:
  -o smtpd_recipient_restrictions=permit_mynetworks,reject

### mysql-virtual_local_email.cf
hosts = 127.0.0.1
user = mailmap
password = ***********
dbname = mailsync
query = SELECT '%s' FROM `maildomains` WHERE `domain` = '%d' AND `active` = '1'

Installed PostSRSd 2.0.2 (over previous 1.12) from sources like this:

$ git clone https://github.com/roehling/postsrsd.git
$ cd postsrsd
$ git checkout 2.0.2
$ mkdir _build && cd _build
$ cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local
$ make -j
$ make install

$ cp postsrsd.conf /usr/local/etc/postsrsd.conf
# + some minor tuning (srs-domain)

$ systemctl stop postsrsd
$ make install
$ systemctl daemon-reload
$ systemctl start postsrsd

Previous PostSRSd 1.12 config vars (merged from /usr/local/share/postsrsd/postsrsd-systemd-launcher and /etc/default/postsrsd):

SRS_FORWARD_PORT=10001
SRS_REVERSE_PORT=10002
SRS_DOMAIN="`postconf -h mydomain || true`"
SRS_SECRET="/etc/postsrsd.secret"
SRS_SEPARATOR="="
SRS_HASHLENGTH=4
SRS_HASHMIN=4
SRS_EXTRA_OPTIONS=
RUN_AS=nobody
SRS_LISTEN_ADDR=127.0.0.1
CHROOT="/usr/local/lib/postsrsd"
SRS_EXCLUDE_DOMAINS=

New PostSRSd 2.0.2 /usr/local/etc/postsrsd.conf (copied from _build/postsrsd.conf):

domains = {}
#domains-file =
srs-domain = example.com
socketmap = unix:/var/spool/postfix/srs
keep-alive = 30
#milter =
original-envelope = embedded
#envelope-database = "sqlite:./senders.db"
secrets-file = "/usr/local/etc/postsrsd.secret"
separator = "="
hash-length = 4
hash-minimum = 4
always-rewrite = off
unprivileged-user = "nobody"
chroot-dir = "/usr/local/var/lib/postsrsd"
syslog = off

I am getting a cannot chdir to home directory of user nobody warning:

$ systemctl status postsrsd
● postsrsd.service - Sender Rewriting Scheme daemon for Postfix
     Loaded: loaded (/etc/systemd/system/postsrsd.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2023-02-26 16:34:16 CET; 3s ago
   Main PID: 2098669 (postsrsd)
      Tasks: 1 (limit: 618968)
     Memory: 204.0K
        CPU: 2ms
     CGroup: /system.slice/postsrsd.service
             └─2098669 /usr/local/sbin/postsrsd -C /usr/local/etc/postsrsd.conf

Feb 26 16:34:16 mail systemd[1]: Started Sender Rewriting Scheme daemon for Postfix.
Feb 26 16:34:16 mail postsrsd[2098669]: postsrsd: warn: cannot chdir to home directory of user nobody: No such file or directory

and in mail.log, I am getting this:

$ tail -f /var/log/mail.log | grep srs
Feb 26 16:35:24 mail postfix/smtpd[2099358]: warning: connect to srs: Permission denied
Feb 26 16:35:24 mail postfix/smtpd[2099358]: warning: table socketmap:unix:srs:reverse lookup error: Permission denied
Feb 26 16:35:24 mail postfix/smtpd[2099358]: warning: socketmap:unix:srs:reverse lookup error for "foo@example.org"

So this looks like some chroot issue. Both /usr/local/lib/postsrsd (from previous PostSRSd 1.12 install) and new /usr/local/var/lib/postsrsd directories exist. I would like to stick with the new recommended pathes, so I kept the ones from the compiled postsrsd.conf.

Can you please give me some advice what is configured wrong here? I know, this doesn't seem to be an issue with PostSRSd, rather a misunderstanding from my side. But it might help others to get it running on a Debian with chrooted Postfix.

roehling commented 1 year ago

The warning about the home directory of nobody is harmless and can be ignored. Can you check the permissions assigned to the socket file /var/spool/postfix/srs? It is supposed to be 0666, i.e. read-write permissions for everyone.

onlime commented 1 year ago

Can you check the permissions assigned to the socket file /var/spool/postfix/srs? It is supposed to be 0666, i.e. read-write permissions for everyone.

Thanks for looking into this. Looks like it has wrong permissions 0755:

srwxr-xr-x+ 1 root root 0 Sun 2023-02-26 16:34:16 /var/spool/postfix/srs

Should I just manually change the permissions or will this file get recreated on every Systemd service restart? Any idea why this happened in the build/install process? maybe because of my umask under root, which should be the default on a Debian system:

$ umask
0022
onlime commented 1 year ago

I confirm this working with 0666 permissions. But after every Systemd postsrsd.service restart, I need to manually fix permissions again:

$ chmod a+w /var/spool/postfix/srs

As a workaround I tried to put this in a Systemd override, /etc/systemd/system/postsrsd.service.d/override.conf:

[Service]
ExecStartPost=/bin/chmod a+w /var/spool/postfix/srs

but that won't work - that post command seems to get executed but probably to early, so perms are still wrong (0755)

@roehling Can you please fix this upstream by explicitly assigning correct permissions to that socket?

onlime commented 1 year ago

I had to give up on PostSRSd 2.0.2 again, as again it caused major problems on my production host.

Even with chmod 0666 /var/spool/postfix/srs, after some hours of running I got those errors again:

Feb 28 05:09:32 mail postfix/smtpd[3310311]: warning: connect to srs: Resource temporarily unavailable
Feb 28 05:09:32 mail postfix/smtpd[3310311]: warning: table socketmap:unix:srs:reverse lookup error: Resource temporarily unavailable
Feb 28 05:09:32 mail postfix/smtpd[3310311]: warning: socketmap:unix:srs:reverse lookup error for "foo@example.com"
Feb 28 05:09:32 mail postfix/smtpd[3310311]: NOQUEUE: reject: RCPT from mx1.example.com[46.231.xxx.xxx]: 451 4.3.0 <foo@example.com>: Temporary lookup failure; from=<bar@example.com> to=<foo@example.com> proto=ESMTP helo=<mx1.example.com>

only a systemctl restart postsrsd and resetting the permissions /var/spool/postfix/srs again helped, but unsure for how long, so I had to revert back to PostSRSd 1.12

roehling commented 1 year ago

I'm not sure what is going on with your system. The socket file is newly created on startup, but PostSRSd sets the umask first, so the permissions should include write permissions for everyone.

One thing I did notice is

srwxr-xr-x+ 1 root root 0 Sun 2023-02-26 16:34:16 /var/spool/postfix/srs

The + sign in the output indicates that you have an ACL applied, which begs the question whether there is also an ACL on the folder /var/spool/postfix itself that affects the permissions of the socket.

If you cannot figure what messes with your permissions, a possible workaround would be using TCP sockets instead of Unix sockets, i.e. set socketmap = inet:localhost:10023 in postsrsd.conf and configure Postfix with socketmap:localhost:10023:forward and socketmap:localhost:10023:reverse.

onlime commented 1 year ago

The + sign in the output indicates that you have an ACL applied, which begs the question whether there is also an ACL on the folder /var/spool/postfix itself that affects the permissions of the socket.

Thanks for pointing me into the right direction! For years now I had those ACLs set for Telegraf monitoring Postfix queue directories, a legacy solution which was replaced by Zabbix/Grafana monitoring long time ago:

$ setfacl -m g:telegraf:rX /var/spool/postfix
$ setfacl -Rm g:telegraf:rX /var/spool/postfix/{active,hold,incoming,deferred,maildrop}
$ setfacl -Rdm g:telegraf:rX /var/spool/postfix/{,active,hold,incoming,deferred,maildrop}

I thought this was just about giving extra permissions for group telegraf and having applied this as default ACL to the whole /var/spool/postfix. But you might be right and this was the cause for my problems. I have now removed the ACLs recursively like this:

$ setfacl -Rbk /var/spool/postfix

So far, PostSRSd 2.0.2 runs fine and on a restart generates the socket with 0777 perms:

srwxrwxrwx 1 root root 0 Tue 2023-02-28 14:29:31 /var/spool/postfix/srs

will keep you posted if this keeps running stable. Because last night, it suddenly stopped working after 00:58AM, even though permissions were still correct.

onlime commented 1 year ago

Still having the same problem, after 1-2hrs, PostSRSd 2.0.2 stops working and I get warning: connect to srs: Resource temporarily unavailable errors, even though /var/spool/postfix/srs permissions stayed untouched.

I have now migrated to socketmap = inet:localhost:10003 (postsrsd.conf) and socketmap:inet:localhost:10003:forward/ socketmap:inet:localhost:10003:reverse (Postfix). Hopefully, this is going to work.

onlime commented 1 year ago

Bad luck. Even with TCP socketmaps I am experiencing the same issue, PostSRSd service only works for ~80mins and then reports time limit exceeded:

Feb 28 17:52:07 mail postfix-srs/cleanup[3789293]: warning: table socketmap:inet:localhost:10003:forward lookup error: time limit exceeded
Feb 28 17:52:07 mail postfix-srs/cleanup[3789293]: warning: socketmap:inet:localhost:10003:forward lookup error for """"
Feb 28 17:52:07 mail postfix-srs/cleanup[3789293]: warning: B313866FFB2: sender_canonical_maps map lookup problem for "" -- message not accepted, try again later

Feb 28 17:52:12 mail postfix-srs/smtpd[3793281]: warning: table socketmap:inet:localhost:10003:reverse lookup error: time limit exceeded
Feb 28 17:52:12 mail postfix-srs/smtpd[3793281]: warning: socketmap:inet:localhost:10003:reverse lookup error for "foo.bar@hotmail.com"
Feb 28 17:52:12 mail postfix-srs/smtpd[3793281]: NOQUEUE: reject: RCPT from localhost.localdomain[127.0.0.1]: 451 4.3.0 <foo.bar@hotmail.com>: Temporary lookup failure; from=<> to=<foo.bar@hotmail.com> proto=ESMTP helo=<mail.example.com>
Feb 28 17:52:12 mail postfix-srs/smtpd[3793281]: disconnect from localhost.localdomain[127.0.0.1] ehlo=1 mail=1 rcpt=0/1 data=0/1 rset=1 quit=1 commands=4/6

Restarting the service works, but probably just for another 80mins. So I have again reverted to PostSRSd 1.12

roehling commented 1 year ago

Do you have any logging output from PostSRSd that might help tracking down the issue?

onlime commented 1 year ago

Do you have any logging output from PostSRSd that might help tracking down the issue?

Currently not, as I only noticed the last successful not reverse-rewritten: Not an SRS address. message in syslog and after lookup error: time limit exceeded was popping up, nothing else was written to syslog from postsrsd. Can you give me a hint how to raise log verbosity? Simply enabling syslog = on in postsrsd.conf?

It looks like this is rather a Postfix issue with some kind of timeout / Postfix not closing socketmap connections and running out of some limit?? But the ~80mins seem quite weird - it was a similar timespan after it stopped working also when using /var/spool/postfix/srs socket instead of TCP connections.

Or was it maybe that weird lookup of "" (instead of a real email) that caused PostSRSd to crash somehow? That was the first lookup that caused problems and after that it stopped working:

Feb 28 17:52:07 mail postfix-srs/cleanup[3789293]: warning: table socketmap:inet:localhost:10003:forward lookup error: time limit exceeded
Feb 28 17:52:07 mail postfix-srs/cleanup[3789293]: warning: socketmap:inet:localhost:10003:forward lookup error for """"
Feb 28 17:52:07 mail postfix-srs/cleanup[3789293]: warning: B313866FFB2: sender_canonical_maps map lookup problem for "" -- message not accepted, try again later
roehling commented 1 year ago

I'll look into it and see if I can reproduce the problem. Thank you for now.

onlime commented 1 year ago

@roehling wonderful! Thanks for the 2.0.3 release. Have it already running in production and will give you a feedback soon.

onlime commented 1 year ago

I confirm this working. At least for 3h10min now, without any issues. Now running it with default socketmap = unix:/var/spool/postfix/srs @roehling Thanks a lot for this great product. Pls tell me where I can drop a small donation / Github sponsors?