openwrt / telephony

The telephony packages feed
104 stars 240 forks source link

siproxd: init requires that interface_inbound and interface_outbound … #801

Open Azureit opened 1 year ago

Azureit commented 1 year ago

…be set

siproxd will not work without if_inbound and if_outbound defined in the config file. Tell meaningful error about interface_inbound and interface_outbound

Maintainer: @micmac1

Signed-off-by: Azureit azurite@mailfence.com

micmac1 commented 1 year ago

Hallo @Azureit

Thanks for bringing this up. I'll look into it in a little while more thoroughly. What caught my attention so far:

a) the "exit 1" in "die()". It's a function so it should return but not exit. It would exit the whole init process and that would mean procd can't keep track. See also [1]. b) Commit subjects should follow convention "package: short description" c) Signed-off-by needs a valid mail address and full name.

I'd also like to ask @guidosarducci to take a look at this as he kindly provided the procd support some time ago.

[1] https://github.com/openwrt/telephony/pull/790

Kind regards, Seb

micmac1 commented 1 year ago

I added the changes to my siproxd init script and ran it with default config. It ran the ubus calls and each time waited around 30 seconds.

ubus -t 30 wait_for network.interface network.lan
<30 seconds waiting around>
ubus -t 30 wait_for network.interface network.wan
<30 seconds waiting around>

I suppose that is unexpected.


ifstatus lan
{
    "up": true,
<snip>
ifstatus wan
{
    "up": true,

Please review this again.

Kind regards, Seb

Azureit commented 1 year ago

I added the changes to my siproxd init script and ran it with default config. It ran the ubus calls and each time waited around 30 seconds.

ubus -t 30 wait_for network.interface network.lan
<30 seconds waiting around>
ubus -t 30 wait_for network.interface network.wan
<30 seconds waiting around>

I suppose that is unexpected.

I adapted that ubus command from here, but I try to run that command via ssh and it timeout 30 seconds too. With ubus list command I notice network.lan doesn't exist, it exists network.interface.lan and ubus -t 30 wait_for network.interface.lan works as expected with no wait because lan is active. We need to use ubus -t 30 wait_for network.interface.lan to check if interface is valid. But the problem is ubus wait_for command waits for the interface to be up or is just confirms the existence of the interface? I guess we could loop command ubus call network.interface.lan status until up=true The ubus wait_for command would be best if support wait for status variable changes, in our case wait up=true

Azureit commented 1 year ago

@micmac1 Finally I rewrite setup_networks() to handle wireguard interface correctly. The problem was the wireguard interface is being initialized two times(don't know why), so the ubus -t 30 wait_for network.interface.wg0 would wait for the first initialization and before the next command run, the the wireguard interface "Interface 'wg0' has lost the connection", this would make the command to find the .l3_device device name fail, now it will retry to find the .l3_device device name. What do you think?

Azureit commented 1 year ago

@micmac1 what is the proper way to tell procd there was a problem at siproxd init and abort the siproxd init. A simple replace exit 1 for a return doesn't work, I tested and it will still run siproxd.

guidosarducci commented 9 months ago

@micmac1 @Azureit I missed this the first time around so just having a look but having some trouble following along...

How to reproduce the issue you see? I see #800 which suggests some kind of race problem, but it lacks a log file showing any details of the network startup and siproxd initialization. I also don't see the relevant siproxd or network config details. Could you explain https://github.com/openwrt/telephony/pull/801#issue-1533026805 a bit more clearly?

I tried reproducing with a simple setup but I don't see the siproxd error from #800 . Using current OpenWrt 23.05.2, I created a wg interface from Luci called wg_voip, put option interface_outbound wg_voip into the default siproxd config, and enabled some siproxd debugging. After rebooting I see:

Mon Nov 20 16:57:02 2023 daemon.notice netifd: Interface 'lan' is enabled
Mon Nov 20 16:57:02 2023 daemon.notice netifd: Interface 'lan' is setting up now
Mon Nov 20 16:57:02 2023 daemon.notice netifd: Interface 'lan' is now up
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'loopback' is enabled
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'loopback' is setting up now
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'loopback' is now up
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'wan' is enabled
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'wan6' is enabled
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'wg_voip' is setting up now
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Network device 'eth0' link is up
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Network device 'lo' link is up
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'loopback' has link connectivity
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Interface 'wg_voip' is now up
Mon Nov 20 16:57:03 2023 daemon.notice netifd: Network device 'wg_voip' link is up
Mon Nov 20 16:57:03 2023 daemon.notice siproxd[1]: siproxd.c:241 INFO:siproxd-0.8.4dev-none 2023-11-11T17:18:29 arm-openwrt-linux-gnu starting up
Mon Nov 20 16:57:03 2023 daemon.debug siproxd[1]: utils.c:293 running w/uid=0, euid=0, gid=0, egid=0
Mon Nov 20 16:57:03 2023 daemon.debug siproxd[1]: utils.c:329 changing uid/gid to nobody
Mon Nov 20 16:57:03 2023 daemon.debug siproxd[1]: utils.c:332 changed gid to 65534 - Ok
Mon Nov 20 16:57:03 2023 daemon.debug siproxd[1]: utils.c:336 changed egid to 65534 - Ok
Mon Nov 20 16:57:03 2023 daemon.debug siproxd[1]: utils.c:340 changed euid to 65534 - Ok
Mon Nov 20 16:57:03 2023 daemon.notice siproxd[1]: rtpproxy_relay.c:119 INFO:Current thread stacksize is 128 kB
Mon Nov 20 16:57:03 2023 daemon.notice siproxd[1]: sock.c:139 INFO:bound to port 5060
Mon Nov 20 16:57:03 2023 daemon.notice siproxd[1]: register.c:119 WARNING:registration file may be corrupt or URLMAP_SIZE has been resized
Mon Nov 20 16:57:03 2023 daemon.notice siproxd[1]: siproxd.c:351 INFO:siproxd-0.8.4dev-none 2023-11-11T17:18:29 arm-openwrt-linux-gnu started
Mon Nov 20 16:57:06 2023 daemon.notice netifd: Network device 'lan1' link is up
Mon Nov 20 16:57:06 2023 daemon.notice netifd: bridge 'br-lan' link is up
Mon Nov 20 16:57:06 2023 daemon.notice netifd: Interface 'lan' has link connectivity

So no error like daemon.err siproxd[1]: utils.c:401 ERROR:Don't know what interface to look for - configuration error?

guidosarducci commented 9 months ago

Hmm, I did notice a strange behaviour with wg interfaces (and also our default wan!). See for example:

root@OpenWrt:~# (. /lib/functions/network.sh; for i in wan lan wg_voip; do for f in network_get_device network_get_physdev; do eval $f _dev $i; echo "$f($i),ret="$_dev,$?; done; done;)
network_get_device(wan),ret=,1
network_get_physdev(wan),ret=wan,0
network_get_device(lan),ret=br-lan,0
network_get_physdev(lan),ret=br-lan,0
network_get_device(wg_voip),ret=wg_voip,0
network_get_physdev(wg_voip),ret=,1

So ..._physdev works with wan but not wg_voip, and ...device work with wg_voip but not wan, and br-lan work with both. It seems like every time I look the behaviour changes.

This commit address the problem I saw, but not sure if it applies to your issue. Worth a try though... https://github.com/guidosarducci/telephony/commit/8a0e178b0a1ff23e67667ce590ba2ba4be49cb10