opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.36k stars 754 forks source link

Unbound DNSBL download unable to resolve DNS at boot, and previously downloaded BL file does not load. Removing 85-dnsbl helps. #6523

Closed Chaskel closed 1 year ago

Chaskel commented 1 year ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

Notice unbound blocklist: https://adaway.org/hosts.txt (exclude: 0 block: 0) Notice unbound blocklist download: 0 total lines downloaded for https://adaway.org/hosts.txt Error unbound blocklist download : unable to download file from https://adaway.org/hosts.txt (error : HTTPSConnectionPool(host='adaway.org', port=443): Max retries exceeded with url: /hosts.txt (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x8027cf640>: Failed to establish a new connection: [Errno 8] Name does not resolve')))

NOTE: After startup of OPNsense, OPNsense diagnostic tools and client systems do not show any DNS problems. It appears only Unbound DNSBL has problems during boot time.

Manual restarting of Unbound service (e.g. restart service button on Blocklist page) does not appear to initiate download of list (based on not seeing messages such as those listed above).

If I disable Blocklist/Apply, then Enable Blocklist/Apply, it appears to trigger getting data:

Notice unbound blocklist parsing done in 0.58 seconds (7355 records) Notice unbound blocklist: https://adaway.org/hosts.txt (exclude: 2 block: 7355) Notice unbound blocklist download: 11782 total lines downloaded for https://adaway.org/hosts.txt Notice unbound blocklist download : exclude domains matching ^(?![a-zA-Z_\d]).|.localhost$

NOTE: Even though the data seems to be retrieved, it appears it is not active until I then restart the service* (e.g. restart service button on Blocklist page).

*It also seems as though I need to go through the disable/enable steps then restart service an additional time to have everything fully work. I am not sure if it is always just one time, but I do know that doing the entire process once does not usually get everything working.

To Reproduce

Steps to reproduce the behavior:

  1. Configure DNS-related items: Services->Unbound DNS->Blocklist - "AdAway List" selected and all other fields empty. Services->Unbound DNS->DNS over TLS - 2 IPv4 and 2 IPv6 servers defined. All 4 using port 853. (1.1.1.1 / 1.0.0.1 / 2606:4700:4700::1111 / 2606:4700:4700::1001) Services->Unbound DNS->General - DNSSEC support enabled. System->Settings->General - No DNS servers manually defined. System->Settings->General - Allow DNS server list to be overridden by DHCP/PPP on WAN is enabled.
  2. Reboot OPNsense
  3. Check Unbound log to see if it was able to successfully download the BL file.

Expected behavior

Describe alternatives you considered

Additional context

Thoughts:

Environment

OPNsense 23.1.5_4-amd64 / FreeBSD 13.1-RELEASE-p7 / OpenSSL 1.1.1t 7 Feb 2023 VNOPN Micro Firewall Appliance with 4 Intel 2.5GbE Intel i225 NIC Ports Intel N3700 Quad Core, Support AES-NI, 8GB DDR3

kulikov-a commented 1 year ago

Hi one more thought is that 'requests' library dont retry by default. adding retry plan (like https://github.com/kulikov-a/core/commit/c6697655b760ca831b1e0f09d5828ccaa4a35c21) (ref. https://forum.opnsense.org/index.php?topic=32327.0) may help if it's a matter of some small overlays. but.. it may produce a very long blocklist download task run if there is a real lack of connection

AdSchellevis commented 1 year ago

@kulikov-a I'm doubting the download should be in the boot flow to be honest, which is the main cause of this issue in my opinion. I'll need to discuss this internally, scheduling a download after a delay could also be an option.

fichtner commented 1 year ago

The previous design intentionally kept the pre-reboot setting in the staging area, also because of volatile /var MFS which is no longer present. Not sure when it was lost.

kulikov-a commented 1 year ago

@AdSchellevis Hi!)

I'm doubting the download should be in the boot flow

fair enough. but there is another thought: technically this situation (problems with name resolution or connection) can occur not only at boot. Is it generally correct to overwrite the cache file with empty data in such cases? or is it better to somehow track _uri_reader exceptions\errors and leave the list untouched in some cases?

AdSchellevis commented 1 year ago

@kulikov-a well, it would likely be better to keep the previous situation when no files can be downloaded.

kulikov-a commented 1 year ago

@AdSchellevis got it, thanks! (so, the expired cache-files reuse can be useful on network problems. but not right after the reboot since the all cache is gone. no universal solution comes to my mind except moving cache from /tmp)

fichtner commented 1 year ago

The cache is restored unless the user doesn’t want it. 😉

kulikov-a commented 1 year ago

not unbound cache - block-list content cache ;) https://github.com/opnsense/core/blob/368e7ac15e660a4e427ac77b9aad70d99771aeee/src/opnsense/scripts/unbound/blocklists/default_bl.py#L101-L118

fichtner commented 1 year ago

Ok, /tmp is obviously cleared on boot.

AdSchellevis commented 1 year ago

We discussed this internally, at the moment the best option seems to be to remove the syshook causing the download to be performed on boot as it is only relevant after a reinstall or configuration import.

The downside might be that after an import, the user will need to download manually, but that's the case for most components at the moment and we don't have a hook for that.

i'll remove the file and close this in the next commit.

Chaskel commented 1 year ago

Thank you for the update. In case the following idea is of use, it may help to mention on the DNSBL configuration page that a cron job will need to be manually created. While that reference is in the online documentation, when I originally configured the options (just using the GUI as my initial guide), it was not immediately clear to me that this step was needed. When researching the issue described in this Github issue, it seemed to me that it wasn't always clear to others as well.

I suspect this could apply to other items that may require a cron job, but thought I would mention it in case it helps.

Thank you again.