Closed mundschenk-at closed 3 years ago
@roehling Have you got any idea why postsrsd would fail to decrypt rewritten addresses created by itself just moments before?
I haven't seen this problem before. Which version are you using?
mail/postsrsd from FreeBSD ports, which is still at 1.3 apparently.
@roehling I've experimented a bit more and been able to reproduce the issue with arbitrary test strings. When the domain part of the "address" is 16 characters long, the first call in a row to get the SRS version returns an invalid hash.
telnet 127.0.0.1 10001
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
get x@123456789abcdefg
200 SRS0=2i4M=2G=123456789abcdefg=x@my-domain
get x@123456789abcdefg
200 SRS0=SU44=2G=123456789abcdefg=x@my-domain
get x@123456789abcdefg
200 SRS0=SU44=2G=123456789abcdefg=x@my-domain
telnet 127.0.0.1 10002
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
get SRS0=2i4M=2G=123456789abcdefg=x@my-domain
500 Hash invalid in SRS address.
get SRS0=SU44=2G=123456789abcdefg=x@my-domain
200 x@123456789abcdefg
Interesting, I will investigate this. Thanks for your help!
FYI: The FreeBSD port has been updated to 1.4, but the issue is still reproducible (not surprising since a patched-for-FreeBSD 1.3 was not that different from vanilla 1.4 from what I saw in the commit history).
Have you been able to find anything? Is there something I can do to help?
@roehling Any news on this?
I have tried quite a few things, but I've been unable to reproduce this:
$ telnet localhost 10001
Trying 127.0.0.1...
Connected to localhost.localdomain.
Escape character is '^]'.
get x@123456789abcdefg
200 SRS0=JOZt=4A=123456789abcdefg=x@example.com
get x@123456789abcdefg
200 SRS0=JOZt=4A=123456789abcdefg=x@example.com
$ telnet localhost 10002
Trying 127.0.0.1...
Connected to localhost.localdomain.
Escape character is '^]'.
get SRS0=JOZt=4A=123456789abcdefg=x@example.com
200 x@123456789abcdefg
It might be some weird interaction with compiler optimizations.
When we switched to a new server recently (same OS, but newer Intel platform), I enabled CPU specific optimizations. I noticed that the error still occured, but not with the same trigger that was reproducible on the old server. I then disabled CPU specific optimizations and the error can be reproduced again with x@123456789abcdefg.
It's been a while since I've coded C, so I doubt I'll find anything. Are there any constructs that could be susceptible to unwanted "optimization" by clang? (This is FreeBSD 11 with clang 3.8, BTW.)
I have added a test framework to simplify the bug hunt. On my computer, the test succeeds both with GCC and clang 3.8. It would be interesting to see if using -O2 instead of -O3 (or even -O0 for that matter) fixes the problem.
I tried different optimization levels (set in make.conf
):
CFLAGS=-O2
(system default): bug occurs
CFLAGS=-O0
: no bug
CFLAGS=-O3
: no bug
Even more interesting: The generated SRS hashes are identical (including getting different hashes on the first and subsequent requests with the same connection). However, with the explicit CFLAGS, both can be decoded!
System default:
$ telnet localhost 10001
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
get x@123456789abcdefg
200 SRS0=6jzY=4A=123456789abcdefg=x@example.org
get x@123456789abcdefg
200 SRS0=fn8W=4A=123456789abcdefg=x@example.org
get x@123456789abcdefg
200 SRS0=fn8W=4A=123456789abcdefg=x@example.org
$ telnet localhost 10002
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
get SRS0=6jzY=4A=123456789abcdefg=x@polis.or.at
500 Hash invalid in SRS address.
get SRS0=fn8W=4A=123456789abcdefg=x@polis.or.at
200 x@123456789abcdefg
With CFLAGS=-O3
:
$ telnet localhost 10001
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
get x@123456789abcdefg
200 SRS0=6jzY=4A=123456789abcdefg=x@example.org
get x@123456789abcdefg
200 SRS0=fn8W=4A=123456789abcdefg=x@example.org
get x@123456789abcdefg
200 SRS0=fn8W=4A=123456789abcdefg=x@example.org
$ telnet localhost 10002
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
get SRS0=6jzY=4A=123456789abcdefg=x@example.org
200 x@123456789abcdefg
get SRS0=fn8W=4A=123456789abcdefg=x@example.org
200 x@123456789abcdefg
Update: It's the -O2
, the other flags are added by the system anyway and don't change the result.
@roehling So we's have to look at the decoding function, it would seem? I'll try the new test framework tonight.
@roehling I just ran the test harness with 3 different settings for CFLAGS. The output is identical for all of them. What should the output look like if correct?
The test generates random email addresses with different lengths and tests whether the SRS transformation works, i.e. valid rewritten addresses are transformed back and modified ones are rejected.
make test
should output whether or not the tests passed or failed.
Ah! I just ran the executable. Well, the tests all pass, regardless of the CFLAGS.
What still looks a bit weird to me: When you connect to forward-Server, you get the same hash for multiple GET calls. When I do that, the first call always results in a different hash than subsequent calls. Why is that?
I have no idea, and I suspect it is the root cause of the bug on your system. I'm a little bit stretchted for time right now, but I'm going to try and reproduce the weird behavior with -O2
as soon as I can.
At first I had assumed it was timestamp issue, but it obviously is not. Please take note that this happens regardless of the CFLAGS
.
I just temporarily installed the compiled master branch and got even weirder behavior:
telnet localhost 10001
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
get x@123456789abcdefg
200 SRS0=Rs9s=4B=123456789abcdefg=x@example.org
get x@123456789abcdefg
200 SRS0=2YsH=4B=123456789abcdefg=x@example.org
get x@123456789abcdefg
200 SRS0=Rs9s=4B=123456789abcdefg=x@example.org
Another thing just came to my mind, and maybe this may be at least proximate cause: I'm using LibreSSL on my machine. Just to check, I tried compiling with plain OpenSSL in a VM now and it appears that the strange behavior does not happen in that case. As far as I can see, there is only a single include from OpenSSL (line 26 of srs2.c
)?
Mhm... but you are #undef
ing that, so it should not matter? Strange.
OOOkay. I think I finally found out what the real issue is/was. I had a look at my secrets file. The production one is pretty ancient and looked ... weird. Repeated, alternating lines, etc. When I generated a new secret string that was less than 80 characters long, I would not get the strange differing hashes. When I switched back to the old secrets file, boom, the behavior was back.
The old file had 9400 bytes. I'll try if I can identify a single secret line that triggers the odd behavior or if it's just the large file that creates problems.
@roehling I've found that a secret line of 114 bytes triggers the odd behavior (presumably a buffer overflow somewhere). Can you try if it is reproducible for you with this string?
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqr stuvwxyzABCDEFGHIJKLMNOPQRSTUVWXY
I seem to be hitting something similiar with a few users at the moment.
OS: FreeBSD SW: Postfix 3.2.2 SW: Postsrsd 1.4
I am using one secret in postsrsd.secret which is 74 characters "long". I updated the hash since I read above that that might do the job. It's now 65 characters long..
same problem here. Nov 14 16:38:56 relay1 postsrsd[31474]: srs_reverse: SRS0=GpBe=NZ=gmail.com=sav.garidech1@my-XXXX.XYZ not rewritten: Hash invalid in SRS address.
It is a stuck mail before i install opensrsd.
How can i send the mail again without this error message ?
@roehling Having lot's of not rewritten: Hash invalid in SRS address. Have you an idea of this problem ?
Last Centos version. Last postfi version. Last opendkim version and last master branch of postsrsd version.
Important note : The SRS might be generated by a third party server ( a customer that using our mail server as relay for outgoing mails ). Why cant it be decode ?
[root@relay1 ~]# telnet 127.0.0.1 10002 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. get SRS0=QS1C=N2=equipjardin.com=e.breton@XXXX.XYZ 500 Hash invalid in SRS address.
Does this problem still occur with the latest release?
I'm still on 1.6 (FreeBSD ports version) and haven't had the issue on my life system after cleaning up the secrets file (i.e. making sure that the secret is less than 114 bytes). Have you been able to reproduce the issue with the secret in https://github.com/roehling/postsrsd/issues/68#issuecomment-298186084?
Yes, I have reproduced it, and I think I found the underlying cause as well.
I've noticed that sometimes (but not always), postsrsd fails to do
srs_reverse
for bounces with the messageHash invalid in SRS address
. At first I assumed it had something to do with case folding, but upon further investigation, that's probably not the reason.Why does the srs_reverse fail in the first instance and not in the second one?