travisghansen / hass-opnsense

OPNsense integration with Home Assistant
194 stars 25 forks source link

configuration problem 0.1.17 #124

Closed homonto closed 6 months ago

homonto commented 6 months ago

so I uninstalled the previous one a week ago after Opnsense upgrade and seeing now that new version is available I installed again but, I am not able to configure the integration no idea why - how can I debug it? opnsense version: OPNsense 24.1.1-amd64 plugin on opnsense: os-homeassistant-maxit (installed) | 1.0 in HA: 0.1.17

Screenshot 2024-02-12 at 09 24 17
travisghansen commented 6 months ago

I’ll need the hass logs

jamesahendry commented 6 months ago

I assume the API key/ Secret are the same, you can resolve fw.local and you've restarted HA since installing the HACS integration?

homonto commented 6 months ago

I assume the API key/ Secret are the same, you can resolve fw.local and you've restarted HA since installing the HACS integration?

all assumptions correct - I use the same credentials as before and they are full rights as I used them for opnsense backup restarted HA of course but did not restart Opnsense but that would be too much, right? ;-)

homonto commented 6 months ago

some extras from HA log:

2024-02-12 09:39:29.406 ERROR (SyncWorker_39) [custom_components.opnsense.pyopnsense] Unexpected get_system_info error err=gaierror(-3, 'Try again'), type(err)=<class 'socket.gaierror'>
2024-02-12 09:39:29.407 ERROR (MainThread) [custom_components.opnsense.config_flow] Unexpected err=gaierror(-3, 'Try again'), type(err)=<class 'socket.gaierror'>
travisghansen commented 6 months ago

That's usually something fundamentally going on with the network and not really related to the integration. Perhaps uncheck the verify ssl box?

https://github.com/travisghansen/hass-opnsense/issues/19

alexdelprete commented 6 months ago

gai stands for getaddrinfo(), so I'd check DNS. Try with IP address first, to confirm.

Once confirmed, go in HA shell and try an nslookup, to check if HA can resolve the hostname correctly.

homonto commented 6 months ago

from HA ssh shell:

Screenshot 2024-02-12 at 10 46 35

but still:

Screenshot 2024-02-12 at 10 53 25

and HA log: 2024-02-12 10:53:13.313 ERROR (MainThread) [custom_components.opnsense.config_flow] Unexpected err=AbortFlow('Flow aborted: already_in_progress'), type(err)=<class 'homeassistant.data_entry_flow.AbortFlow'>

alexdelprete commented 6 months ago

can you do dig fw.local please?

also, can you check HA debug log again, when using IP address in the URL?

did you uncheck verify ssl in config box?

homonto commented 6 months ago

can you do dig fw.local please?

also, can you check HA debug log again, when using IP address in the URL?

dig fw.local not working from HA but dig 192.168.1.1 works

Screenshot 2024-02-12 at 10 58 00
alexdelprete commented 6 months ago

dig fw.local not working from HA

like I suspected. You're having dns resolution issues. The DNS server is 172.30.32.3. Are you using HassOS?

alexdelprete commented 6 months ago

Read this long thread about the problems of using HAOS and its CoreDNS configuration with .local addresses: https://community.home-assistant.io/t/local-dns/178108

So let's concentrate on https://192.168.1.1. Use that and uncheck ssl cert verification. Then check HA logs and post the error here.

homonto commented 6 months ago

I am seriously lost: 1- my HA is in VM in proxmox, but for last 2 years already 2- this plugin was working fine until... today (I unistalled after last Opnsense upgrade that killed the plugin) 3- my HA network settings:

Screenshot 2024-02-12 at 11 59 16

and from cli on HA:

Screenshot 2024-02-12 at 11 59 48

ping from HA works with both: fw.local and 192.168.1.1

Screenshot 2024-02-12 at 12 00 33

but from the same HA dig and nslookup to fw.local is not working:

Screenshot 2024-02-12 at 12 02 03

trying to add https://192.168.1.1 in plugin config shows this HA error in log:

2024-02-12 12:03:22.528 ERROR (MainThread) [custom_components.opnsense.config_flow] Unexpected err=AbortFlow('Flow aborted: already_in_progress'), type(err)=<class 'homeassistant.data_entry_flow.AbortFlow'>

what am I doing wrong? I understand that this IP 172... is local IP of one of HA containers and it is not any of my machines ;-)

I am open for anything as long as you have enough patience guys ;-) (and highly appreciated)

alexdelprete commented 6 months ago
  1. To troubleshoot DNS issues you don't use ping, but you use nslookup and dig. And you have name resolution issues in your setup. This is one of the reasons that I dropped HAOS 2y ago and went to container installation on Proxmox. I manage the DNS, not HAOS, which has serious issues in that regard that devs always struggled to acknowledge. ;)

  2. If you had the integration working, why are you configuring it from scratch? You should've disabled it instead of deleting it. So now you simply needed to re-enable it after the update.

  3. Using https://192.168.1.1 is not throwing the gai errors you had, so the problem was indeed the DNS resolution. Now you have another issue: HA can't connect to https://192.168.1.1. Did you try http instead of https? Enable http on OPNsense and try with http first. If it works, we'll see why https doesn't work.

homonto commented 6 months ago
  1. To troubleshoot DNS issues you don't use ping, but you use nslookup and dig. And you have name resolution issues in your setup. This is one of the reasons that I dropped HAOS 2y ago and went to container installation on Proxmox. I manage the DNS, not HAOS, which has serious issues in that regard that devs always struggled to acknowledge. ;)

I got your point

  1. If you had the integration working, why are you configuring it from scratch? You should've disabled it instead of deleting it. So now you simply needed to re-enable it after the update.

now, only now, I understood why my kids don't like me when I ask them... "WHY DID YOUD DO THAT?" ;-)

  1. Using https://192.168.1.1 is not throwing the gai errors you had, so the problem was indeed the DNS resolution. Now you have another issue: HA can't connect to https://192.168.1.1. Did you try http instead of https? Enable http on OPNsense and try with http first. If it works, we'll see why https doesn't work.

enabled http instead of https on opnsense but nothing much changed

Screenshot 2024-02-12 at 12 44 59
homonto commented 6 months ago

I found this in /etc/resolv.conf: search local.hass.io nameserver 172.30.32.3

so I changed to: nameserver 192.168.100.31

now dig and nslookup is like this:

Screenshot 2024-02-12 at 13 23 50

but still error trying to configure it:

2024-02-12 13:19:18.444 ERROR (SyncWorker_37) [custom_components.opnsense.pyopnsense] Unexpected get_system_info error err=gaierror(-3, 'Try again'), type(err)=<class 'socket.gaierror'>
2024-02-12 13:19:18.445 ERROR (MainThread) [custom_components.opnsense.config_flow] Unexpected err=gaierror(-3, 'Try again'), type(err)=<class 'socket.gaierror'>
buenni86 commented 6 months ago

I think I am having the same issue. I don't think that it is a DNS issue. Will try to get logs later.

@homonto can you try to run on HA a curl command to see if you can get data. curl -k -u API_KEY:API_SECRET https://OPNSENSE_IP_OR_HOSTNAME/api/interfaces/overview/interfacesInfo

I can get the interface info, but i am unable to get the integration working

homonto commented 6 months ago

using curl with https://fw.local from HA terminal it gave me.... ;-)

gave me all - I deleted the picture I think there were some passwords ;-)

{"total":14,"rowCount":14,"current":1,"rows":[{"flags":["up","broadcast","running","simplex","multicast"],"capabilities":["rxcsum","txcsum","vlan_mtu","vlan_hwtagging","jumbo_mtu","vlan_hwcsum","tso4","tso6","lro","wol_ucast","wol_mcast","wol_magic","vlan_hwtso","netmap","rxcsum_ipv6","txcsum_ipv6","nomap"],"options":["rxcsum","txcsum","vlan_mtu","vlan_hwtagging","jumbo_mtu","vlan_hwcsum","tso4","tso6","lro","wol_magic","vlan_hwtso","rxcsum_ipv6","txcsum_ipv6","nomap"],"macaddr":"60:be:b4:0b:62:14","ipv4":[],"ipv6":[],"supported_media":["autoselect","2500Base-T","1000baseT","1000baseT full-duplex","100baseTX full-duplex","100baseTX","10baseT\/UTP full-duplex","10baseT\/UTP"],"is_physical":true,"device":"igc0","mtu":"1500","media":"1000baseT <full-duplex>","media_raw":"Ethernet autoselect (1000baseT <full-duplex>)","status":"up","identifier":"","description":"Unassigned Interface"},{"flags":["up","broadcast","running","simplex","multicast"],"capabilities":["rxcsum","txcsum","vlan_mtu","vlan_hwtagging","jumbo_mtu","vlan_hwcsum","tso4","tso6","lro","wol_ucast","wol_mcast","wol_magic","vlan_hwtso","netmap","rxcsum_ipv6","txcsum_ipv6","nomap"],"options":["rxcsum","txcsum","vlan_mtu","vlan_hwtagging","jumbo_mtu","vlan_hwcsum","tso4","tso6","lro","wol_magic","vlan_hwtso","rxcsum_ipv6","txcsum_ipv6","nomap"],"macaddr":"60:be:b4:0b:62:15","supported_media":["autoselect","2500Base-T","1000baseT","1000baseT full-duplex","100baseTX full-duplex","100baseTX","10baseT\/UTP full-duplex","10baseT\/UTP"],"is_physical":true,"device":"igc1","mtu":"1500","media":"2500Base-T <full-duplex>","media_raw":"Ethernet autoselect (2500Base-T <full-duplex>)","status":"up","routes":["192.168.1.0\/24"],"config":{"if":"igc1","descr":"Servers_1","enable":"1","lock":"1","spoofmac":"","ipaddr":"192.168.1.1","subnet":"24","identifier":"lan"},"identifier":"lan","description":"Servers_1","enabled":true,"link_type":"static","ipv4":[{"ipaddr":"192.168.1.1\/24"}],"vlan_tag":null,"gateways":[]},{"flags":["up","broadcast","running

and so on

buenni86 commented 6 months ago

@homonto Looks like you can talk to your OPNsense. Can you check if your home-assistant.log hast the same error like mine?

Here are the logs from home-assistant.log

xmlrpc.client.ProtocolError: <ProtocolError for API_KEY:API_SECRET@OPNSENSE_IP/xmlrpc.php: 500 Internal Server Error>
2024-02-12 14:55:21.989 ERROR (SyncWorker_30) [custom_components.opnsense.pyopnsense] Unexpected get_telemetry error err=<ProtocolError for API_KEY:API_SECRET@OPNSENSE_IP/xmlrpc.php: 500 Internal Server Error>, type(err)=<class 'xmlrpc.client.ProtocolError'>
2024-02-12 14:55:21.989 ERROR (MainThread) [custom_components.opnsense] Unexpected error fetching OPNsense OPNsense state data: <ProtocolError for API_KEY:API_SECRET@OPNSENSE_IP/xmlrpc.php: 500 Internal Server Error>
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 313, in _async_refresh
    self.data = await self._async_update_data()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 269, in _async_update_data
    return await self.update_method()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/opnsense/__init__.py", line 97, in async_update_data
    await hass.async_add_executor_job(lambda: data.update())
  File "/usr/local/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/opnsense/__init__.py", line 97, in <lambda>
    await hass.async_add_executor_job(lambda: data.update())
                                              ^^^^^^^^^^^^^
  File "/config/custom_components/opnsense/__init__.py", line 316, in update
    self._state["telemetry"] = self._get_telemetry()
                               ^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/opnsense/__init__.py", line 225, in inner
    response = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/opnsense/__init__.py", line 249, in _get_telemetry
    return self._client.get_telemetry()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/opnsense/pyopnsense/__init__.py", line 101, in inner
    raise err
  File "/config/custom_components/opnsense/pyopnsense/__init__.py", line 98, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/opnsense/pyopnsense/__init__.py", line 1037, in get_telemetry
    data = self._exec_php(script)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/opnsense/pyopnsense/__init__.py", line 88, in inner
    response = func(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/opnsense/pyopnsense/__init__.py", line 131, in _exec_php
    response = self._get_proxy().opnsense.exec_php(script)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/xmlrpc/client.py", line 1122, in __call__
    return self.__send(self.__name, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/xmlrpc/client.py", line 1461, in __request
    response = self.__transport.request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/xmlrpc/client.py", line 1166, in request
    return self.single_request(host, handler, request_body, verbose)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/xmlrpc/client.py", line 1196, in single_request
    raise ProtocolError(
xmlrpc.client.ProtocolError: <ProtocolError for API_KEY:API_SECRET@OPNSENSE_IP/xmlrpc.php: 500 Internal Server Error>

and this is from OPNsense crash report:

PHP Fatal error:  Uncaught TypeError: Cannot access offset of type string on string in /usr/local/etc/inc/xmlrpc/hass.inc(12) : eval()'d code:26
Stack trace:
#0 /usr/local/etc/inc/xmlrpc/hass.inc(12) : eval()'d code(60): interfaces_api()
#1 /usr/local/etc/inc/xmlrpc/hass.inc(12): eval()
#2 /usr/local/opnsense/contrib/IXR/IXR_Library.php(446): exec_php_xmlrpc('\nini_set('displ...')
#3 /usr/local/opnsense/contrib/IXR/IXR_Library.php(384): IXR_Server->call('opnsense.exec_p...', '\nini_set('displ...')
#4 /usr/local/opnsense/contrib/IXR/IXR_Library.php(357): IXR_Server->serve('<?xml version='...')
#5 /usr/local/etc/inc/xmlrpc.inc(67): IXR_Server->__construct(Array)
#6 /usr/local/www/xmlrpc.php(104): XMLRPCServer->start()
#7 {main}
  thrown in /usr/local/etc/inc/xmlrpc/hass.inc(12) : eval()'d code on line 26

@travisghansen I hope these logs can help to figure out where the problem is

homonto commented 6 months ago

from GUI the only error on HA I have as above:

2024-02-12 13:19:18.444 ERROR (SyncWorker_37) [custom_components.opnsense.pyopnsense] Unexpected get_system_info error err=gaierror(-3, 'Try again'), type(err)=<class 'socket.gaierror'>
2024-02-12 13:19:18.445 ERROR (MainThread) [custom_components.opnsense.config_flow] Unexpected err=gaierror(-3, 'Try again'), type(err)=<class 'socket.gaierror'>
buenni86 commented 6 months ago

Can you check /config/home-assistant.log

homonto commented 6 months ago

Can you check /config/home-assistant.log

that was from there

buenni86 commented 6 months ago

Then I guess we don't have the same error ;) But both can access the API with curl, but not the intergration. Have to look into it later.

alexdelprete commented 6 months ago

Here are the logs from home-assistant.log

your 500 internal server error is a permission problem for the API key you are using. You need to use a full admin account. OPNsense 24.1 changed granular permissions for some security standard. I solved this creating an hass full admin users and generating the key associated to that one.

using curl with https://fw.local from HA terminal it gave me.... ;-)

you are using curl with -k, that is the skip cert check. I asked you before: did you uncheck that setting when configuring the integration? Also: did you test with http?

You ask for help but don't follow the instructions. :D

homonto commented 6 months ago

I asked you before: did you uncheck that setting when configuring the integration? Also: did you test with http?

yes, and I showed the screen for it as well - just under your comment ;-)

Screenshot 2024-02-12 at 15 56 57
alexdelprete commented 6 months ago

Sorry, I missed that post. :)

Well, you showed me a screenshot for http, so verify ssl doesn't have effect: the message is "already configured", it means that it finds in HA registry the old setup. And that means it is communicating, because it gets the deviceid from OPNsense and checks if it is already present in HA device registry.

Since opnsense doesn't provide via XMLRPC/REST a unique deviceid, the integration creates one randomly and writes it in OPNsense, in the file /conf/hassid. So what I would do in your case is RENAME that file in OPNsense, so the integration creates a new one, and it shouldn't tell you that it is already configured in HA.

Let me know...;)

PS: you still have the DNS issue...and that is more important than opnsense integration, you need to fix that asap. Then we'll also try to understand why it doesn't work in https.

buenni86 commented 6 months ago

your 500 internal server error is a permission problem for the API key you are using. You need to use a full admin account. OPNsense 24.1 changed granular permissions for some security standard. I solved this creating an hass full admin users and generating the key associated to that one.

Ok, I created a new user with admin rights, but still the error is the same (500 internal server error)

homonto commented 6 months ago

Sorry, I missed that post. :)

nobody is perfect and this is not a problem ;-)

Since opnsense doesn't provide via XMLRPC/REST a unique deviceid, the integration creates one randomly and writes it in OPNsense, in the file /conf/hassid. So what I would do in your case is RENAME that file in OPNsense, so the integration creates a new one, and it shouldn't tell you that it is already configured in HA.

removing this file allowed me to configure under: https://192.168.1.1

Let me know...;)

done - thank you ;-)

PS: you still have the DNS issue...and that is more important than opnsense integration, you need to fix that asap. Then we'll also try to understand why it doesn't work in https.

as I wrote also somewhere above, I reconfigured on HA /etc/resolv.conf from 172.... to my DNS and since then dig and nslookup from HA works properly:

Screenshot 2024-02-12 at 16 24 42
alexdelprete commented 6 months ago

removing this file allowed me to configure under: https://192.168.1.1

Good to know. Now try with the hostname, if you fixed DNS. ;)

alexdelprete commented 6 months ago

Ok, I created a new user with admin rights, but still the error is the same (500 internal server error)

500 internal server problem is 99% related to the user/api/key. You can search past issues, I'm pretty confident you should concentrate on that.

Make sure the user is in the admins group. And that in Effective Privileges section of the user page, you see inherit from admins and if you edit those privileges don't check anything, let the user inherit without adding any privilege (all checkboxes unchecked).

image

To make sure you unchecked all privileges, edit and check the filter checkbox, you should see no records, like this:

image

alexdelprete commented 6 months ago

This is the OPNsense 24.1 note regarding users/security:

image

homonto commented 6 months ago

removing this file allowed me to configure under: https://192.168.1.1

Good to know. Now try with the hostname, if you fixed DNS. ;)

first I tried with fw.local but ... you know - I would proudly announce it if it worked so, step by step:

1st try: https://fw.local

Screenshot 2024-02-12 at 16 34 04

2nd, without SSL verification:

Screenshot 2024-02-12 at 16 34 33

however as I showed you, from HA dig fw.local and nslookup work now after changing DNS in resolv.conf

alexdelprete commented 6 months ago

however as I showed you, from HA dig fw.local and nslookup work now after changing DNS in resolv.conf

if it works with IP, the communication issue is not an issue. if the error is GAI, it means python's getaddrinfo() fails. And it's not a coincidence.

HAOS dns sucks. You will have random issues also in other functionalities with time. Always remember an old System Admin saying: 95% of the time, when something strange happens and you can't find the cause, it's DNS. :)

DNS is the most important service, often overlooked because generally it works, but when it doesn't you can get crazy.

BTW: I remember from 2y ago with HAOS that the dns config files were overwritten when you will update HAOS. So even when fixed manually, you had to do it at every upgrade. HAOS simply sucks in that regard. :)

homonto commented 6 months ago

one more lesson I learned today: when you suggested to delete /conf/hassid and I succeeded configuring, I found out such status...

Screenshot 2024-02-12 at 16 31 44

so, I did NOT delete the integration in the first place when it stopped working (week or so ago) - I disabled it (probably after reading here that patch is on the way). But today, seeing 0.1.17 announced I could not find it in my HA (disabled and filtered out from lovelace) so I removed it from HACS, installed again and I was trying to configure it again and again... probably enabling it would have been enough... - oh life ;-)

Thing is: until I succeeded with configuration I was not able to see it - filtered out ;-)

homonto commented 6 months ago

Always remember an old System Admin saying: 95% of the time, when something strange happens and you can't find the cause, it's DNS. :)

I am 100% with you from my experience as well... if you asked me if it worked before over IP or hostname - no idea - I created certs just few weeks ago - until then I used always IP

homonto commented 6 months ago

@alexdelprete thank-you-mate-tcfreer

alexdelprete commented 6 months ago

how do they say: "all's well that ends well!". :)

Glad to have been helpful.