Open jpbede opened 4 years ago
Yeah, this is a difficult issue. Currently connections are grouped by destination domain but as you said a single MX can serve a bunch of domains. Usually the likes of Gmail/Hotmail etc should not care about connection counts too much but I've had problems with smaller hosts, eg. edu.ee that hosts a lot of school domains in Estonia – if someone sends a message to a mailing list that contains a lot of teachers in the *.edu.ee realm then the MX for it starts blocking really fast.
One solution would be to use MX domain name for grouping but this would require resolving DNS at queue time which currently is not done. At first glance it seems that replacing delivery.domain
in this line with MX hostname (probably using the one with highest priority sorted alphabetically) maybe fixes it.
DNS MX resolving can take a lot of time, this is why it is currently not done in queue time and instead kept as one of the last steps. Normally DNS queries are very fast but occasionally something hangs and it can take a lot of time or temporarily fail.
Sure, this isn't easy. And DNS resolving at queueing doesn't sounds reasonable for me. As you said, it can be slow.
I've implemented a quick idea yesterday: https://github.com/jpbede/zone-mta/commit/21845964631a8fb2f8b38bbb191d98772482db81
What do you think about something like that ?
It uses the connectHook
of mx-connect. It writes a incrementing/decrementing lock to redis if a MX record is used.
Redis based locks can be difficult to handle. You want to avoid keeping counters in Redis forever (decresing value to 0 does not clear it). Also you have the risk that if ZoneMTA is restarted then you end up in a state where counters in Redis indicate that you can not make any new connections while in reality you have 0 connections against target host. Even if the counter expires automatically you are going to have an artificial delay while messages are queued but not delivered. That's why the current locking counters are kept in the master process - restarting service restarts all counters for the current instance. The same happens when a single connection dies and is restarted by the master - all the counters allocated by that process should be cleared.
I think it would be easier to implement such MX counters also in the master process and not in Redis. This way you could more easily keep a separate counters object for each child process and when isFree
is called then sum the counter value for the same domain from each child process counter. When child process dies then the counter object can be cleared as well. If master is restarted then all counters are cleared.
You can send a remote request from child process to master process like this and then process and respond like this. Counter handling (in master process) would look something like this.
This sounds reasonable. I'm going to implement it and test it.
I've implemented it now https://github.com/jpbede/zone-mta/commit/bddd0a07057ec8c1a5dbc7b9240ca66bcfc62c21
But i'm currently not happy with the config option, should we use the maxConnections
of the domain config ?
Look good now 👍 And sure, config may be problematic though. For example you probably do not want to have the same limit both for Gmail and for some small host.
What do you think about a config option like this https://github.com/jpbede/zone-mta/commit/cf904c880aa62fde6b0757c5980605a81df69b0f ?
Seems great 👍
Great. I'm currently testing it in our cluster
Any updates here guys ? How is it looking ? Can we merge this live ? Would be really usefull.
Ehm ... sorry for not keeping this issue up to date..
Currently i faced issues with connection handling with this feature on our prod. cluster. They not closed correctly.
I'm working on a solution for this.
Updates here ? Would love to see this through.
Unfortunately i didn't have time to take a closer look at it. Hopefully i've sometime in the next days to dig into it deeper.
@jpbede any progress here ? do you need help ?
This feature would also very helpful for us. Any news about it?
@jpbede just sent you a sponsor boost to help you work this out :)
@dazoot Oh, thank you so much :)
I will definitely take another look at this.... I'm so sorry this went untouched for so long.
@andris9 do we need a way to limit the connections per MX ?
Currently i have the issue that, for e.g. Microsoft, has serval domains on one MX. When sending to them with a connection limit set for the domain, we are running into their connection limit. One example:
Max. 5 connections to outlook.com Max. 5 connections to outlook.de Those domains are sharing the same MX.
If microsoft now has a connection limit of 5 per MX, we've reached it.
So i've implemented a feature to limit the connections per MX. When this limit is reached ZoneMTA is trying the next MX.
Is this something we need ? Or do you have a other solution for this issue ?
Sure, we can use a dedicated Sending Zone but therefor i need to know all of their domains.