trusteddomainproject / OpenDKIM

Other
97 stars 52 forks source link

OpenDKIM times out on startup #209

Open GeorgeCox opened 7 months ago

GeorgeCox commented 7 months ago

Versions: opendkim 2.11.0 OS Debian 11

I am running opendkim servers with a mysql backend for both the signing table and key table. This setup was working nicely with around ~1000 domains in the database. We have since added ~100k domains and ~200k keys to the database, all the opendkim servers seemed to be coping well with this, though I've now discovered that the opendkim service will not restart.

When trying to start the service I see a systemd timeout as the service is taking so long to start up. If I run opendkim outside of systemd it hangs and doesn't fork like you'd expect. Running with strace I can see opendkim seems to be processing all the domains/keys from the database, and now we have so many the service is taking hours to start. The process eventually fails complaining it cannot find a valid key record for a particular domain in the keytable, so never actually starts.

I have a script validating that we have a key for every domain in the database, but this database is being updated as we add new domains/keys, so I think the error which causes the process to exit is due to the database changing while opendkim is starting.

I've looked through your documentation, but cannot see a way to do either of the following:

The servers which have not been restarted seem to be coping fine and are picking up the new domains/keys which have been added to the DB.

I also can't find any information on how other people are attempting to use opendkim at a large scale.

Any suggestions would be appreciated

GeorgeCox commented 7 months ago

I've managed to solve the issue by adding an additional index to the KeyTable keycol column, and then increasing the timeout on the systemd service to 5 minutes.

It does still take a few minutes to start up, and I expect we could still potentially see the missing keys errors again in the future if the service is started while the db is being updated. This fix will obviously not scale forever either.

So information on getting the service to start quicker and start if a key is missing would still be very useful for me.

futatuki commented 7 months ago

As far as I read the code, opendkim always verifies every entries in SigningTable on start up when loading config in dkimf_config_load() in Line 8321-8383.

I think it is easy to implement an option to skip it. However, I don't check yet, what happens if the signing key entity is not valid when attempt to use it.

futatuki commented 7 months ago

As I use Lua script as set up hook instead of SigningTable, my environment is already not perform verification for KeyTable entry. And there is no problem even if sign is requested for non existing key ID.

Unfortunately, Lua functions provided as odkim.* does not allow us to look up from a query key to its value(s), it can only check if the entry for the query exists or not, so using Lua script cannot be a workarround for your case, though.

r-a-z-v-a-n commented 2 months ago

Hi @futatuki , thank you so much for pointing out where to look in the source code. I have opened https://github.com/trusteddomainproject/OpenDKIM/pull/226 . I have also provided some results to tests with bad keys. Please kindly review my changes.

r-a-z-v-a-n commented 2 months ago

hi @futatuki apologies i made a mistake and closed the pull request. anyway, I created https://github.com/trusteddomainproject/OpenDKIM/pull/228 with CheckSigningTable as you requested.