haraka / Haraka

A fast, highly extensible, and event driven SMTP server
https://haraka.github.io
MIT License
5.02k stars 662 forks source link

Splitting Transactions #573

Closed baudehlo closed 8 years ago

baudehlo commented 10 years ago

Encompassing issue for splitting transactions.

msimerson commented 10 years ago

I worked around the problem on that Haraka server where I ran into the looping issue by just delivering everything to qmail, which then handles the "local goes that way," and "remote goes that way" sorting.

Right now, it's pretty difficult to use two queueing plugins (outbound + smtp_forward, or outbound + qmail, or smtp_forward + qmail) without encountering deliverability problems.

With smtp_forward, I don't ever want a message queued in Haraka because I'm likely operating in front of an Exchange server which is doing user validation. Without that, I'll have tons of crap accumulating in my queue that I can't bounce.

With the qmail_queue or smtp_forward-to-qmail where the address validation has been performed (by Qmail::Deliverable), that's not an issue. But I'd speculate that most folks are going to be in the former camp.

baudehlo commented 10 years ago

So right now we have the following levels of breakdown:

If you split a transaction, you'd have to think of it in the following way:

The problem comes at "final-dot" - when the remote end finishes sending, it waits for the result of "final-dot" to know the mail is successfully (and reliably) queued. If the result is "OK", any failures beyond that point are the fault of the receiving mail server, and MUST generate a bounce back to the sender.

If you split the transaction, you cannot ensure reliability (without modifying how Haraka handles mail). The typical example being a 2 recipient transaction where 1 recipient is local and one is remote. So we split, and recipient 1 goes to outbound.js, and recipient goes to smtp_forward (or some other queueing plugin). If one succeeds and one fails, we don't know what to respond to "final-dot".

One solution: If a transaction is split, always respond 250 OK to final-dot, and send a bounce if hook_queue fails.

msimerson commented 10 years ago

Another idea for the multi-recipient split transaction is to attempt delivery first to the queue mechanism(s) least likely to succeed. In your example, attempt smtp_forward first. If it fails, return the failure immediately in the SMTP conversation. If smtp_forward succeeds, then the 1st recipient has their message queued. Repeat. If all recipients queue successfully, return 250 OK. If any recipient fails, return the failure in the SMTP conversation ("delivery for failed@example.com failed because blah blah.").

If the sender fixes the problem (removes the invalid recipient) and tries again, the MUA's of the clients who received the message twice will see the duplicate Message-ID and suppress the presentation of the duplicate.

This problem definitely requires more thought.

celesteking commented 10 years ago

Oh, what an interesting conversation. Here's how we do it here.

First of all, I introduced notion of "primary" transaction and "secondary" transactions. Primary is the one that is served by smtp_forward-like delivery mechanism. Secondary transactions go to outbound queue that takes care of the rest. Primary fails, client is either deferred or denied, depending on status returned by backend. Primary succeeds, secondaries are queued and then client gets 2xx code.

Now about our infrastructure. Haraka plays "frontend" role. We've got multiple "backends" that store mail(imap/pop3 access) and a lookup mechanism present in haraka (via plugin) that "points" smtp_forward to the correct backend.

Transaction split is done by grouping RCPT by backend host. The backend host that has the greatest number of recipients becomes "primary", all others are "secondaries". And the reason for this algorithm is practical observations.

Usually, user puts "most important" recipient in To: field, and that becomes first RCPT. Then he possibly adds CC and BCC. And in case of multiple recipients per backend, it would be wise to pick those for primary candidates as that would allow us to "cover" broader failure possibility. In case of 1 recipient per backend, but multiple backends, we just mark first RCPT for primary transaction as he's probably the most important one :) Also, another factor that influenced the algorithm design was the way our infrastructure was built -- it's highly likely that a person has all his domains residing on a single backend. That's why the idea of having "multiple" simultaneous smtp_forward streams to different backends was abandoned (alongside with duplicates on failures, as you noted earlier).

Sadly, I don't have enough stats yet to prove the efficiency.

smfreegard commented 10 years ago

the MUA's of the clients who received the message twice will see the duplicate Message-ID and suppress the presentation of the duplicate.

Support in MUAs that do this is pretty patchy and shouldn't be relied upon.

In the case of mixed recipients (local + remote) we should probably default to sending everything to outbound and introducing new hooks that allow outbound to deliver mail to specific domains via alternative non-SMTP mechanisms (in the case of smtp_forward/smtp_proxy; this is easy - it's simply forced routing applied to get_mx). To deal with excessive bounces, we'd also need to have optional mechanisms to verify each recipient at hook_rcpt so that we don't accept mail that will later bounce on delivery.

msimerson commented 10 years ago

Having SMTP routes enabled within outbound could go a long ways towards solving this problem. I've been planning to add per-domain routing to smtp_forward, principally because I've got a buddy who I set up a Haraka install for, in front of his Exchange server. He loved it, and shortly wanted to put another Exchange server behind it for one of his buddies. That doesn't work because smtp_forward doesn't have domain routing, so I plan to add it. In this case, outbound doesn't work well either because Haraka will end up accepting all sorts of undeliverable garbage that it is obligated to bounce.

I've thought about adding a rcpt_to.smtp_probe type plugin, that validates the RCPT TO against a remote SMTP server and then caches positive results for N days and negative results for N hours.

smfreegard commented 10 years ago

That doesn't work because smtp_forward doesn't have domain routing, so I plan to add it.

The problem with per-domain routing in smtp_forward is what happens in the case where you have a mail going to recipients at each domain? Depending on the domains, this might be likely or unlikely to happen; but it has to be handled and DENYSOFTs on any recipients (e.g. at RCPT TO time) requiring different routing to those already seen are the only way that this can work reliably, but it will cause delays for those recipients. The alternative of trying to deliver to multiple backends in hook_queue would likely be a bit fragile otherwise.

I've thought about adding a rcpt_to.smtp_probe type plugin, that validates the RCPT TO against a remote SMTP server and then caches positive results for N days and negative results for N hours.

I do this already and it's definitely the way to go IMO. I would have shared this code already, but it's tightly integrated within a rather large plugin that handles all of the routing.

From bitter experience with this method in our previous smtpd, the cache lengths are much shorter; 5 minutes for invalid recipients and 1 hour for valid and the cache (Redis) is only ever used when there isn't already a pooled SMTP connection available to the backend. The reason for that is because it's really common for people to e-mail accounts that haven't been set-up yet; then you have to clear out cache records for them - it was our no.1 support question for the years prior to the new method I adopted when we switched to Haraka.

msimerson commented 9 years ago

My mail routing looks like this:

I know, friends don't let friends use Exchange. I'd also like to remove qmail from the latter two steps, but Haraka has no concept of local -vs- remote domains.

For my local domains, recipient validation happens via rcpt_to.qmail_deliverable which works great. The Exchange servers I front-end for have been generating an annoying amount of bounces lately because there is no recipient validation until qmail tries to deliver the message. I'd have the exact same problem if I used outbound instead of qmail.

After re-reading this thread, I sat down this afternoon with the intention of writing a recipient validation plugin using SMTP callouts. Then I read about how brain damaged Exchange 2013 is and assumed that such wackiness would eventually be a problem for my callouts. Besides, PR #636 is a better way to validate Exchange users.

Getting LDAP validation set up is not something I can do today. Adding domain routing to the smtp_forward plugin is and it works 95% of the time. To do it safely, the per-domain routes are only used when:

  1. the SMTP connection has a single recipient
  2. every recipient's forward-to host is identical

In the edge cases, mail is handled exactly the same as w/o a specific forward route. Now my routing looks this:

a. Local domains: Haraka -> smtp_forward -> qmail b. Exchange servers: Haraka -> smtp_forward -> Exchange (95%) c. Exchange servers: Haraka -> smtp_forward -> qmail -> Exchange (5%) d. Local users: SMTP AUTH -> Haraka -> qmail -> internet

Dexus commented 9 years ago

This must still be open? Ref: merged PR smtp_client per-domain routes #671

msimerson commented 9 years ago

@Dexus , yes. The issue is only partially resolved (in the case of #671, that only works for messages with a single recipient.

Dexus commented 9 years ago

ah ok thx for the info :)

baudehlo commented 8 years ago

Seems like this should be closed. Making smtp_forward default to sending outbound mail helped a bit.

Re-open if still an issue we need to consider. Otherwise it seems too hard to fix generically.