lieser / dkim_verifier

DKIM Verifier Extension for Mozilla Thunderbird
MIT License
210 stars 35 forks source link

Error connecting to the DNS server #230

Closed chkaufmann closed 3 years ago

chkaufmann commented 3 years ago

Well it worked great, I do not have any idea what has changed!

DKIM_Verifier.Verifier DEBUG 1 DKIM-Signatures found. verifier.mjs.js:1147:8 DKIM_Verifier.Verifier DEBUG Verifying DKIM-Signature 1 ... verifier.mjs.js:1162:9 DKIM_Verifier.Verifier DEBUG Parsed DKIM-Signature 1: {…} ​ a_hash: "sha256" ​ a_sig: "rsa" ​ b: "X1QLH1ybi3WwLDCfFf1BNsiPpZZ2/YJ7AQ/4YBAeTNKGV4DWH8ZSl5jLUxamWChdTKRj+cFTgU6vGOKmbCUOwV70iF/H6ms/V6ydsbw6bHQRcjgmobeFy0J+E1joFvxBwtR7UJQdByYqDViVb2OaAaXcXVNfXUKOVgZPolrDhpw=" ​ b_folded: "X1QLH1ybi3WwLDCfFf1BNsiPpZZ2/YJ7AQ/4YBAeTNKGV4DWH8ZSl5jLUxamWChdT\r\n\t KRj+cFTgU6vGOKmbCUOwV70iF/H6ms/V6ydsbw6bHQRcjgmobeFy0J+E1joFvxBwtR\r\n\t 7UJQdByYqDViVb2OaAaXcXVNfXUKOVgZPolrDhpw=" ​ bh: "40Jc0tX5bKRiSOgX7fHRUnNwQ14Lg6cISIpgF5Oj/2M=" ​ c_body: "relaxed" ​ c_header: "relaxed" ​ d: "github.com" ​ h_array: Array(5) [ "date", "from", "to", … ] ​ i: "@github.com" ​ i_domain: "github.com" ​ keySecure: undefined ​ l: null ​ original_header: "DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=github.com;\r\n\ts=pf2014; t=1607509055;\r\n\tbh=40Jc0tX5bKRiSOgX7fHRUnNwQ14Lg6cISIpgF5Oj/2M=;\r\n\th=Date:From:To:Subject:From;\r\n\tb=X1QLH1ybi3WwLDCfFf1BNsiPpZZ2/YJ7AQ/4YBAeTNKGV4DWH8ZSl5jLUxamWChdT\r\n\t KRj+cFTgU6vGOKmbCUOwV70iF/H6ms/V6ydsbw6bHQRcjgmobeFy0J+E1joFvxBwtR\r\n\t 7UJQdByYqDViVb2OaAaXcXVNfXUKOVgZPolrDhpw=\r\n" ​ q: "dns/txt" ​ s: "pf2014" ​ t: 1607509055 ​ v: "1" ​ warnings: Array [] ​ x: null ​ z: null ​

: Object { … } verifier.mjs.js:1164:9 DKIM_Verifier.Verifier DEBUG computed body hash: 40Jc0tX5bKRiSOgX7fHRUnNwQ14Lg6cISIpgF5Oj/2M= verifier.mjs.js:1025:7 DKIM_Verifier.JSDNS: Resolving pf2014._domainkey.github.com TXT by querying 8.8.8.8 JSDNS.jsm.js:537:6 DKIM_Verifier.JSDNS: pf2014._domainkey.github.com/TXT: Lookup failed with rcode 5 JSDNS.jsm.js:806:7 DKIM_Verifier.background DEBUG dns result {…} ​ bogus: false ​ data: null ​ rcode: 5 ​ secure: false ​ : Object { … } background.mjs.js:42:6 DKIM_Verifier.background INFO DNS query failed with result: {…} ​ bogus: false ​ data: null ​ rcode: 5 ​ secure: false ​ : Object { … } background.mjs.js:48:7 DKIM_Verifier.Verifier ERROR DKIM_InternalError: DKIM_DNSERROR_SERVER_ERROR getKey moz-extension://4bae297f-09f6-41fb-b9e8-e6079512da68/content/background.mjs.js:49 verifier.mjs.js:984:8 handleException moz-extension://4bae297f-09f6-41fb-b9e8-e6079512da68/modules/dkim/verifier.mjs.js:984 processSignatures moz-extension://4bae297f-09f6-41fb-b9e8-e6079512da68/modules/dkim/verifier.mjs.js:1168 DKIM_Verifier.Verifier DEBUG Exception on DKIM-Signature 1 verifier.mjs.js:1169:9 DKIM_Verifier.SignRules DEBUG shouldBeSigned: Object { shouldBeSigned: false, sdid: (2) […], foundRule: true, hideFail: false } signRules.mjs.js:374:7 DKIM_Verifier.AuthVerifier DEBUG authResult: Object { version: "2.1", dkim: (1) […] } AuthVerifier.mjs.js:173:7 ![Capture](https://user-images.githubusercontent.com/24633960/101617994-ab6f8680-3a11-11eb-892c-d36201b74c52.JPG)
lieser commented 3 years ago

sorry for the late response.

Do you still have the problem? The logs show Lookup failed with rcode 5, which means the DNS server (Google's 8.8.8.8 in your case) actively refused the connection. Must admit I have no idea why this is/was happening. If you still have the problem, I would sugest to try a different DNS server (e,g, CloudFlare's 1.1.1.1).

silvershadowcc commented 3 years ago

I have the same problem. After changing one of the settings (and changing it back) it works for a while. Then it starts displaying the error for every email. Tried both Google's 8.8.8.8 and CloudFlare's 1.1.1.1 So I don't think that has anything to do with it.

lieser commented 3 years ago

Are you also seeing Lookup failed with rcode 5, or something else? I still thing that this is not an error in the add-on, and instead some strange networking problem.

But the add-ons behavior in regards to retrying is definitely not helping in this case, and needs improvements. Currently, if there is an error with a server, the server is marked as "dead", and not retied until the DNS settings change, or Thunderbird is restarted (disabling and enabling the add-on should work too).

For now I suggest the following workarounds:

To enable dns.jsdns.autoResetServerAlive, do the following:

  1. Open the add-ons manager
  2. Click the gear in the top right and select "Debug Add-ons"
  3. Click the "Inspect" button of the DKIM Verifier (a tab should open with something similar as described at https://extensionworkshop.com/documentation/develop/debugging/)
  4. In the top, click on Console
  5. In the console, enter and execute await (await import("../modules/preferences.mjs.js")).default.setValue("dns.jsdns.autoResetServerAlive", true)
lieser commented 3 years ago

I attached a test version to https://github.com/lieser/dkim_verifier/issues/199#issuecomment-774676192 there the key storing feature should work also in TB 78.

tadpole2077 commented 3 years ago

Thanks lieser for the 4.0.0pre4.1 drop. I'm also experiencing the same fault as above, fault behavior invokes after a set amount of hours typically >3 hours on my sys. Restart of thunderbird resolves it, as you mention above DNS server marked as 'dead' is likely a root cause key factor, not sure if this is caused by a temp network outage that then triggers the dead DNS marker.

I've installed the new test 4.1 package now, and will report back on finding over the next 24 / 48 hours.

tadpole2077 commented 3 years ago

was working fine since yesterday with latest 4.1 codeset until about an hour ago, now all mail Dkim checks fail.. below is your (lieser) github mail response to me.
Its an rcode 2 fault from the _getKeyFromDNS() call. Likely on process restart the issue with no longer occur - until next x triggers it.

image

lieser commented 3 years ago

Note that I did not change anything in 4.0.0pre4.1 to improve the underlying issue. It only includes the Caching of the DKIM keys feature, that should greatly reduce the impact of this issue.

Regarding the log you showed: The interesting pars are happening in the experiments part of the add-on. Unfortunately the logs for that (or code) do not appear in the add-ons debugger.

You will have to open the normal Error Console from Thunderbird to see them (or the debugger for Thunderbird itself, but this is not needed).

tadpole2077 commented 3 years ago

Thunderbird (browser) Error Console isnt showing any additional info that i can tell(snap below). Any other options? I'll try and walk though the code.. My thunderbird crashed and restarted with same error still occurring on all mail DKIM checks, so still able to extract some debug data.

image

lieser commented 3 years ago

The log you posted shows JSDNS: no DNS Server alive. Meaning at that point the add-on was already in the state there all further DNS queries will fail. All this messages from JSDNS is something you did not see before.

What error caused your DNS server to be marked as disabled should be shown further up the log. See e.g. the image in the first post, there it shows Lookup failed with rcode 5.

tadpole2077 commented 3 years ago

I see that jsdns.js:86 log entry now. I've rebooted my desktop box this morning, DKIM plugin is working optimally at the moment, i'll keep the console log open today in the hope of capturing the additional jsdns fault log details at precisely the moment the DNS server is tagged as dead (likely assigning the rcode = 5)

Per your prior Reponses on this thread, once the DNSServer is marked as dead, no retry occurs - and either DNS setting charge or TB restart is required to reactive it again.
As a test scenario, is there a way for me to set the DNSServer as alive via console cmd line, to force a further check - purpose of this is to confirm that the problem is due to a prior tagging of the DNSServer as death, and not some prevailing issue within the plugin instance data store that would mark the DNSServer as death again.

tadpole2077 commented 3 years ago

No replication of the above error reported today, so i did a bit of experimenting.
Tried simulating a ISP disconnect, then opened a mail to triggered a DKIM process, resulting in "error connecting to DNS server".

Then reconnected ISP, got some more mail down, and triggered a DKIM process - same error shown "error connecting to DNS server". Likely this is a known limitation, no DNS server connection Retry, but still I would expect the DKIM process to attempt to reconnect to DNS or at a minimum to reattempt a DNS server connection after a set interval of time since last fail, maybe a timestamp stored with prior fail, and retry after x seconds, if attempting to connect to the DNS server after prior failure is a concern.

image

tadpole2077 commented 3 years ago

Today i captured the full sequence of the likely fault, or at least one scenario that causes it.. Internet connect was working fine - no issues with other sites or download of mail before opening the first one from reputable source reebok.com that then throws the rcode 2 fault,

DNS server reporting timeouts, resulting in dead server.

image

Expanding the Win Reg log entry and DNS servers entry... image

Not sure if this initial error on maxLogLevelPerf is related in any way. Console.maxLogLevelPref used with a non-existing pref: extensions.dkim_verifier.experiments.logging

tadpole2077 commented 3 years ago

restart and retest TB, after checking connection health is good, i can ping 8.8.8.8, 1.1.1.1, 1.0.0.1 Same issue highlighted the rcode:5 entries in log below, timed out error reported but i question if that maybe a false positive for timeout, instead some other earlier issue and then throwing the wrong error description.

image

tadpole2077 commented 3 years ago

if you can advise on how to debug the experiments js files, such as jsdns.js I believe i can track down the fault, and walk though code such as browser.jsdns.txt(). My hunch at this time is a timing issue on a async call, which may be using the browser.jsdns() before it is fully init() and configured, and possible causing a block on configureJsdns() - due to the flag jsdnsIsConfigured = true. Until i can walk though the experiments code its purely a guess.

lieser commented 3 years ago

Thanks for the offer of debugging, but currently I personally don't see a need for it.

The Console.maxLogLevelPref used with a non-existing pref: extensions.dkim_verifier.experiments.logging can be ignored.

Regarding rcode 5:

Are you sure that 192.168.42.129 is a DNS Server? Have yout tried that nslookup google.com 192.168.42.129 works?

Regarding the timeout:

The timeout is generated by Thunderbird's underlying network code, not the add-on. Is also comes as expected after 5 seconds, the timeout the add-on uses by default. I expected it is a short network issue, that you just did not notice.

If you didn't know, it is possible to make a continuous ping with ping -t 1.1.1.1, which helps detecting if there are indeed some short temporary network problems.

If you still want to debug:

For debugging experiments Thun Tools > Developer Tools > Developer Toolbox has to be used (Note that a windows in the background will open there you need to accept the connection request). grafik

The files will only show up after they there used once.

gxtu commented 3 years ago

I'm getting the same error (Error connecting to the DNS server) in v4.0.0pre4. This is an example of what I see in the developer toolbox: dkim

Manually running nslookup -type=TXT mail20150812._domainkey.mailbox.org 10.0.0.1 does seem to work though and results in:

Server:     10.0.0.1
Address:    10.0.0.1#53

Non-authoritative answer:
mail20150812._domainkey.mailbox.org text = "v=DKIM1; p=MIIBIjANBgkqhkiG9w" "0BAQEFAAOCAQ8AMIIBCgKCAQEAytTqbbblelvR6JIoHG" "1GHd0C4VIx4j8Br7EAZXijWoG34Lpa1vJCKYjjg0LsBsXPy5BHrqqyLYvm9GJ7dL3" "ZS+ec0o4co1F6j6DwTUs13aYd0Nni8qKWoY9uf" "qbn2dqdZJu+mg3bwkS5iqzG+t0CqHQNI/aIyXTkfyTsDM+aNl3r2p0bJXw4G5obNn2hneqfqD9x3z+0EwTwsnN04vlw68tBTpd8" "UcZJJXVGSN+Beq1DyElFYczrzkln8k5I9A3mEeqHAYlbF5hCRCkVw2Pp7SmyTQeV+JHcnHOSE4zpV4SUsECKSz1En6ix0+kI23DUpBF84ofMiGMe9P0YMupu0QIDAQAB"

Authoritative answers can be found from:

Edit: This appears to be a problem with JSDNS: jsdns

When I switch the resolver to libunbound instead of JSDNS in the addon's preferences, everything works fine.

tadpole2077 commented 3 years ago

Great feedback, tks! Sorry about the delay in response, got pulled into some firefighting(not my code). I didnt find the JSDNS.jsm.js as its not loaded/displayed in the TB debugger until After the first DKim pass though unless breakpoints have already added . O the joys of debugging JS with many async call and callback fn's firing off in the background, can be difficult to trace and catch..

Regarding 192.168.42.129 :

CMD: ipconfig/all

Ethernet adapter Ethernet 2:

Connection-specific DNS Suffix . : Description . . . . . . . . . . . : SAMSUNG Mobile USB Remote NDIS Network Device Physical Address. . . . . . . . . : 02-2B-7B-7D-14-6B DHCP Enabled. . . . . . . . . . . : Yes Autoconfiguration Enabled . . . . : Yes IPv4 Address. . . . . . . . . . . : 192.168.42.149(Preferred) Subnet Mask . . . . . . . . . . . : 255.255.255.0 Lease Obtained. . . . . . . . . . : Friday 19 February 2021 17:51:01 Lease Expires . . . . . . . . . . : Friday 19 February 2021 20:42:23 Default Gateway . . . . . . . . . : 192.168.42.129 DHCP Server . . . . . . . . . . . : 192.168.42.129 DNS Servers . . . . . . . . . . . : 1.1.1.1 1.0.0.1 NetBIOS over Tcpip. . . . . . . . : Enabled

In the entries for the DNS servers stored on my registry at location: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces

Conclusion:

=========================== Walk though of Code, on load of TB, opening first email with DKIM sig.

Conclusion: Cross referenced with the actual Register data and its all good, no missing keys and all the DHCPNameServer and NameServer IP's are extracted as expected. So no issues on the pull of DNCP and NameServer IP's that May map to live DNS public server. The 2xDHCPNameServer mappings will likely fail on DNS request calls - unless they pass-though those requests.

============================== I've started DNS ping logging, and certainly i'm getting Ping timeout on 1.1.1.1 avg of around 50 per hour. I need to do some more tests on this if its an issue with my local wireless connection to local router, or on the router to ISP via fiber, or on the DNS server itself. I'll also continue code familiarisation, tracing calls on dns call timeouts to eval the existing process, fallback process and retry.

One question for you, on these calls to the DNS servers (4 registered IP's in my case), the expected action is to call each one in sequence, on fail(timeout etc) of one DNS server, trigger a query to the next DNS IP in the array?

lieser commented 3 years ago

@gxtu

Can you please repeat your test with nslookup -type=TXT "-set vc" mail20150812._domainkey.mailbox.org 10.0.0.? The "-set vc" part should make nslookup use TCP (and not UDP), which is also what JSDNS is using.

If it still works with nslookup using TCP, do you by chance have access to the logs of the DNS server?

I also just now did a network capture with Wireshark, and then comparing it to nslookup noticed the following: The JSDNS resolver is currently using a static (i.e. not changing) Transaction ID (TXID) (in RFC 1035 only named ID). Maybe this is causing some DNS servers to actively refuse the query with RCODE 5. If there is someone more experienced with DNS who could confirm this, that would be nice.

If the current behavior is something you can easily reproduce, I will also try to provide you with a testing version that is not using a static TXID, so you can just try out if this is causing problems.

@tadpole2077

Yes, that code is definitely not the most pleasant to debug. Especially since I my extensions to the original code didn't help readability (the code would definitely profit from a refactoring)

on these calls to the DNS servers (4 registered IP's in my case), the expected action is to call each one in sequence, on fail(timeout etc) of one DNS server, trigger a query to the next DNS IP in the array?

Your observations are correct. If there is an error, the DNS server is marked as dead, and the next DNS server is tried. If all DNS servers are marked as dead, the current default behavior is that all following DNS queries fail, until the DNS servers are reconfigured. As I described in https://github.com/lieser/dkim_verifier/issues/230#issuecomment-774522316, this can be changed via the dns.jsdns.autoResetServerAlive setting.

If dns.jsdns.autoResetServerAlive is set to true, the query that sets the last DNS server to dead still fails. But the next query should start to try out all DNS servers again. With your unstable connection, I would recommend to change this value.

And again, with your temporary network issues, the add-ons behavior may not be the best user experience. But it is currently unfortunately totally expected.

gxtu commented 3 years ago

Can you please repeat your test with nslookup -type=TXT "-set vc" mail20150812._domainkey.mailbox.org 10.0.0.? The "-set vc" part should make nslookup use TCP (and not UDP), which is also what JSDNS is using.

The command I had to use is slightly different, but I think we've found the problem.

$ nslookup -type=TXT -vc mail20150812._domainkey.mailbox.org 10.0.0.1 
Server:     10.0.0.1
Address:    10.0.0.1#53

** server can't find mail20150812._domainkey.mailbox.org: REFUSED

Looks like this is a problem with just my local DNS server, since others appear to be working just fine:

Server:     1.1.1.1
Address:    1.1.1.1#53

Non-authoritative answer:
mail20150812._domainkey.mailbox.org text = "v=DKIM1; p=MIIBIjANBgkqhkiG9w" "0BAQEFAAOCAQ8AMIIBCgKCAQEAytTqbbblelvR6JIoHG" "1GHd0C4VIx4j8Br7EAZXijWoG34Lpa1vJCKYjjg0LsBsXPy5BHrqqyLYvm9GJ7dL3" "ZS+ec0o4co1F6j6DwTUs13aYd0Nni8qKWoY9uf" "qbn2dqdZJu+mg3bwkS5iqzG+t0CqHQNI/aIyXTkfyTsDM+aNl3r2p0bJXw4G5obNn2hneqfqD9x3z+0EwTwsnN04vlw68tBTpd8" "UcZJJXVGSN+Beq1DyElFYczrzkln8k5I9A3mEeqHAYlbF5hCRCkVw2Pp7SmyTQeV+JHcnHOSE4zpV4SUsECKSz1En6ix0+kI23DUpBF84ofMiGMe9P0YMupu0QIDAQAB"

Authoritative answers can be found from:

My router is running OpenWrt with dnsmasq, but the log only shows:

query[TXT] mail20150812._domainkey.mailbox.org from 10.0.0.10
daemon.info dnsmasq[1454]: 532 10.0.0.10/51005 config error is REFUSED

Seems like this is a problem on my side then. Thanks for your help and sorry for bothering you with it!

lieser commented 3 years ago

As there has been no response since a while I guess the issue as been resolved for you all (or at least a workaround found). Note that I created a separate issue for exposing and/or changing the dns.jsdns.autoResetServerAlive value (#269).

If anyone still has problems, please create a new issue, as this one is getting hard to follow.