msimerson / Mail-Toaster-6

Mail Toaster 6
https://github.com/msimerson/Mail-Toaster-6/wiki
BSD 3-Clause "New" or "Revised" License
46 stars 16 forks source link

Haraka SMTP Client Pool Issues #527

Closed greenshrike closed 1 year ago

greenshrike commented 1 year ago

I've seen the previous tickets with Haraka, am having issues with it myself, and thought I'd add my own take, as it appears different from the previous tickets.

First, from its SMTP banner, I'm running ESMTP Haraka/2.8.28 which appears to be the latest version listed at https://haraka.github.io/releases/. FreeBSD is 13.1-RELEASE-p5 and 'freebsd-update fetch' reports no updates needed.

The Haraka jail was redeployed yesterday (Dec 13th), and it (presumably) fetched the latest haraka.sh provisioning script.

I'm having problems getting mail delivered from external sources, though Haraka, to qmail-smtpd.

Dumping traffic on lo0, I see Haraka spin up a TCP session upon the first mail delivery. That SMTP session ends with a RSET command and qmail responding with a "250 flushed" response. Upon the second mail delivery, a second TCP session is spun up, the mail delivered, and then the session RSET. Ditto third through tenth.

A sockstat at this point shows all ten TCP sessions from haraka to the vpopmail jail's 172.16.15.8:25 port up and active.

As qmail-smtpd's run script has tcpserver limited to 10 connections (via '-c10'), when the eleventh message comes in, Haraka sends a SYN to 172.16.15.8:25, but never gets qmail's SMTP banner. Eventually the TCP session times out and Haraka forwards a 4xx temp fail to the external sending server.

Restarting Haraka appears to cause the TCP sessions to end gracefully with FINs, but I wasn't paying much attention and don't have a trace that shows that handy, though if need be I should be able to regenerate one.

So it appears that Haraka is leaving the TCP connections open for reuse, but never actually reuses them, instead always opening up new connections until qmail runs out of connections.

In addition, I see no evidence of any sort of connections being reaped after a set idle timeout period, despite setting a low value to what I think is the appropriate config variable:

cgreen:vmail:/data/haraka/config# more smtp_forward.ini host=172.16.15.8 port=25

connect_timeout=60 timeout=1 max_connections=10

(Haraka code note: in exports.get_pool() the timeout var is slurped in ( if (!opts.cfg.pool_timeout) opts.cfg.pool_timeout = opts.cfg.timeout || 300 ), then has one subtracted from it before being multiplied by 1000 to get milliseconds ( idleTimeoutMillis: (opts.cfg.pool_timeout - 1) 1000, ). I didn't get too deep, but I saw no sanity checking on the value so if, for example, a value of 0 was set to attempt to disable the connection pool, the milliseconds would I think be set to -1000 -- which is why I'm currently using a timeout of 1 rather than 0. The code also implies a pool_timeout config var, but this var isn't mentioned on https://haraka.github.io/plugins/queue/smtp_forward/. The var pool_timeout is* mentioned for the outbound module, but from what I understand, mail goes through either outbound or smtp_forward, so I've no reason to suspect settings in outbound.ini should affect the smtp_forward module.)

Further, despite previous ticket #511 in which Matt said that he "disabled smtp_client pooling", I see no evidence that the pools have gone away, as the SMTP sessions to qmail still end with a RSET and the connection hanging around instead of a QUIT and the connection terminating.

After trawling through the plugin code (greping for RSET to narrow the potential source code file down), I started reviewing plugins/queue/smtp_forward.js, which pointed to the release() function in smtp_client.js. I was unable to take the time to understand the pool management code (I don't know whatever language .js is (javascript?) so I'm relying upon generic programming knowledge and just mimicking existing code), but it was easy enough to sabotage the pool code by modifying the release() function to force an SMTP QUIT command into the SMTP session, with the specific sabot employed being:

*** smtp_client.js.old  Wed Dec 14 10:34:44 2022
--- smtp_client.js      Tue Dec 13 17:19:41 2022
***************
*** 227,242 ****
--- 227,247 ----

      release () {
          if (!this.connected || this.command === 'data' || this.command === 'mailbody') {
              // Destroy here, we can't reuse a connection that was mid-data.
              this.destroy();
              return;
          }

+         logger.loginfo(`[smtp_client] ${this.uuid} releasing, state=${this.state}`);
+         this.send_command('QUIT');
+         this.destroy();
+         return;
+
          logger.logdebug(`[smtp_client] ${this.uuid} resetting, state=${this.state}`);
          if (this.state === STATE.DESTROYED) return;
          this.state = STATE.RELEASED;

This is not a kludge I want to be running on a production SMTP server, but the TCP traffic looks correct, with the QUIT being sent to qmail, qmail responding with a "221 vmail.xxxxxxx.yyy" message and then an exchange of FINs. Sockstat shows the TCP connections disappearing, and a flood of a couple hundred emails was successfully delivered to qmail. As such, this modification might be of use to someone who deployed a recent Haraka and is now stuck.

This is the first toaster I've built on FreeBSD13 so that might be the issue -- though previous tickets mentioned 12, so maybe not, and maybe it's just Haraka that's borked.

As for what I'd like, that's having the pooling code to work. ;-)

More importantly, though, I think being able to disable pooling via config vars rather than code kludges would be good, along with documentation as to how to do it -- maybe even something as simple as bracketing the above kludge with an if (timeout == 0) statement.

If you want more information or for me to try something, just let me know. :-)

rricker1 commented 1 year ago

I can confirm that this was the same issue I was experiencing and that the applied patch has worked for me. Thank you greenshrike, I had indeed deployed a recent Haraka and was stuck.

msimerson commented 1 year ago

I'm working on this. It turns out its actually two different issues, one that @greenshrike has identified here (I have upstream fixes in Haraka for this) and changes in npm that had me scratching my head because the version of Haraka being explicitly downloaded to /root/Haraka is now different than the version installed to /usr/local/lib/node_modules/Haraka when one does npm install -g /root/Haraka. That itself was a workaround for previous versions of npm. Apparently that's no longer needed (or even working as it used to) so I have a fix coming for that too, which simplifies the build process.

msimerson commented 1 year ago

fixed.