Closed fnarfbargle closed 4 years ago
The weird thing is that ipc.link()
actually cannot return nil
, except when there's an out of memory issue: https://github.com/openwrt/luci/blob/master/libs/luci-lib-ip/src/ip.c#L1498
The only condition under which link()
does not return at least an empty table is if (!msg || !cb)
Yep, understand that and have already trawled through that code. I put some basic instrumentation in and logged several thousand calls per day, with at most one failure. The very peculiar thing is it will fail on the first call (always wlan0) and immediately succeed on the second. So call exactly the same thing twice and the first one fails but the second succeeds.
So by putting that debug in place I never see a failure backtrace. I only see the "is nil wlan0" as it will fail on the debug line and the next test passes.
As I said in the previous issue, I'm fairly sure it's not a memory related issue :
root@WRT32x:~# free total used free shared buffers cached Mem: 510936 226216 284720 548 4060 12632 -/+ buffers/cache: 209524 301412
This box literally does nothing other than shunt packets between the wireless and wired LAN. It's a very expensive AP.
It has me stuffed, but as long as I have a web tab with the status overview up it happens pretty much every day.
I'll put some instrumentation in ip.so to find out which allocation is failing and compile up a new image to test.
Debug printing from lua is easy enough. Is there a reference for getting a debug print out of the linked library you could point me towards?
It should be enough to use fprintf(stderr, "msg=%p, cb=%p\n", msg, cb);
in the library to make the output appear in logread
.
I suspect the problem can be "fixed" with a change like this:
diff --git a/libs/luci-lib-ip/src/ip.c b/libs/luci-lib-ip/src/ip.c
index 854a0c09c..7cf3cd444 100644
--- a/libs/luci-lib-ip/src/ip.c
+++ b/libs/luci-lib-ip/src/ip.c
@@ -1517,8 +1517,10 @@ static int link_get(lua_State *L)
struct nl_cb *cb = nl_cb_alloc(NL_CB_DEFAULT);
struct ifinfomsg ifm = { .ifi_index = if_nametoindex(dev) };
+ lua_newtable(L);
+
if (!msg || !cb)
- return 0;
+ return 1;
nlmsg_append(msg, &ifm, sizeof(ifm), 0);
@@ -1526,8 +1528,6 @@ static int link_get(lua_State *L)
nl_cb_set(cb, NL_CB_FINISH, NL_CB_CUSTOM, cb_done, &st);
nl_cb_err(cb, NL_CB_CUSTOM, cb_error, &st);
- lua_newtable(L);
-
nl_send_auto_complete(sock, msg);
while (st.pending > 0)
But Ideally I'd like to understand the underlying problem first. Maybe its not the amount of free memory but memory fragmentation, or some bad caching / allocation behaviour in libnl-tiny.
Ok, failure to allocate isn't the problem.
I put the fprintf in and got the lua error 3 times last night, but never once triggered the print in ip.c. I re-compiled this morning with a hack to force the fprintf to print just to be sure it would actually leave data in the logs and it did.
So the error is not due to a failure to allocate those structures.
Ok, the error is failure in nl_connect(). On failure it is returning -6 :
Fri Aug 3 16:16:33 2018 daemon.err uhttpd[2337]: no nl_connect -6 Fri Aug 3 16:16:33 2018 daemon.err uhttpd[2337]: ipc.link(ifname) is nil wlan0
In libnl, -6 is NLE_EXIST "Object exists". So the socket is created, nl_connect returns "Object exists", this uses _error() to push 3 elements onto the lua stack with the first element being 'nil', which may explain ipc.link(ifname) returning nil. I assume I should be able to pull the error number and description off the Lua stack also. Is this the right way to return an error to lua, as it doesn't seem to be handled?
Plenty of instances splattered across google of similar errors using libnl. Some bletcherous examples of workarounds using loops and waits. I'll keep poking.
Yeah, one common pattern for returning errors in Lua library function is nil, errno, errstr
or nil, errstr
.
The fault is in the calling Lua code of network.lua
. It simply assumed that the call can never fail.
Amusingly I updated the WRT1900ACv2 to OpenWrt 18.06.1 r7258-5eb055306f (from the repositories, not compiled locally) a couple of weeks ago and I'm periodically seeing exactly the same error at the same location in the logs.
That router is actually working as a router and is considerably more busy with a completely different configuration, but precisely the same error. Odd that I'm apparently the only person seeing this.
I suppose I'd better take a look at the lua->C call chain and figure out how to gracefully fail that call.
I just encountered this error about 20 minutes ago, on a R7800 running r9019-0e8d5ff (git-19.011.54860-47d9205)
Failed to execute template dispatcher target for entry '/admin/status/overview'.
The called action terminated with an exception:
/usr/lib/lua/luci/template.lua:97: Failed to execute template 'admin_status/index'.
A runtime error occurred: /usr/lib/lua/luci/model/network.lua:203: attempt to index a nil value
stack traceback:
/usr/lib/lua/luci/model/network.lua:203: in function '_wifi_iwinfo_by_ifname'
/usr/lib/lua/luci/model/network.lua:1664: in function '__init__'
/usr/lib/lua/luci/util.lua:65: in function 'wifinet'
/usr/lib/lua/luci/model/network.lua:1598: in function 'callback'
/usr/lib/lua/luci/model/uci.lua:269: in function 'foreach'
/usr/lib/lua/luci/model/network.lua:1595: in function 'get_wifinets'
/usr/lib/lua/luci/tools/status.lua:147: in function 'wifi_networks'
[string "/usr/lib/lua/luci/view/admin_status/index.h..."]:45: in main chunk
stack traceback:
[C]: in function 'error'
/usr/lib/lua/luci/template.lua:97: in function </usr/lib/lua/luci/template.lua:85>
(tail call): ?
/usr/lib/lua/luci/dispatcher.lua:849: in function </usr/lib/lua/luci/dispatcher.lua:848>
Line 203 is the above mentioned local phy_only = force_phy_only or (ipc.link(ifname).type ~= 1)
Memory is also not an issue on my device, taken shortly after discovery.
root@OpenWrt:~# free
total used free shared buff/cache available
Mem: 477372 121768 341596 1416 14008 325228
Swap: 0 0 0
Just to add another potential data point in the event it has any relation:
Wed Mar 13 10:35:47 2019 daemon.err uhttpd[1418]: luci: accepted login on /admin/network/wireless for root from 192.168.1.12
Wed Mar 13 10:56:57 2019 daemon.err uhttpd[1418]: Failed to execute call dispatcher target for entry '/admin/network/wireless_status/radio0.network1,radio1.network1'.
Wed Mar 13 10:56:57 2019 daemon.err uhttpd[1418]: The called action terminated with an exception:
Wed Mar 13 10:56:57 2019 daemon.err uhttpd[1418]: /usr/lib/lua/luci/model/network.lua:203: attempt to index a nil value
Wed Mar 13 10:56:57 2019 daemon.err uhttpd[1418]: stack traceback:
Wed Mar 13 10:56:57 2019 daemon.err uhttpd[1418]: /usr/lib/lua/luci/model/network.lua:203: in function '_wifi_iwinfo_by_ifname'
Wed Mar 13 10:56:57 2019 daemon.err uhttpd[1418]: /usr/lib/lua/luci/model/network.lua:1664: in function '__init__'
Wed Mar 13 10:56:57 2019 daemon.err uhttpd[1418]: /usr/lib/lua/luci/util.lua:65: in function </usr/lib/lua/luci/util.lua:61>
Wed Mar 13 10:56:57 2019 daemon.err uhttpd[1418]: (tail call): ?
Wed Mar 13 10:56:57 2019 daemon.err uhttpd[1418]: /usr/lib/lua/luci/tools/status.lua:185: in function 'wifi_network'
Wed Mar 13 10:56:57 2019 daemon.err uhttpd[1418]: /usr/lib/lua/luci/controller/admin/network.lua:277: in function </usr/lib/lua/luci/controller/admin/network.lua:270>
Wed Mar 13 10:56:57 2019 daemon.err uhttpd[1418]: (tail call): ?
Wed Mar 13 10:57:17 2019 daemon.err uhttpd[1418]: Failed to execute call dispatcher target for entry '/admin/network/wireless_status/radio0.network1,radio1.network1'.
Wed Mar 13 10:57:17 2019 daemon.err uhttpd[1418]: The called action terminated with an exception:
Wed Mar 13 10:57:17 2019 daemon.err uhttpd[1418]: /usr/lib/lua/luci/model/network.lua:203: attempt to index a nil value
Wed Mar 13 10:57:17 2019 daemon.err uhttpd[1418]: stack traceback:
Wed Mar 13 10:57:17 2019 daemon.err uhttpd[1418]: /usr/lib/lua/luci/model/network.lua:203: in function '_wifi_iwinfo_by_ifname'
Wed Mar 13 10:57:17 2019 daemon.err uhttpd[1418]: /usr/lib/lua/luci/model/network.lua:1664: in function '__init__'
Wed Mar 13 10:57:17 2019 daemon.err uhttpd[1418]: /usr/lib/lua/luci/util.lua:65: in function </usr/lib/lua/luci/util.lua:61>
Wed Mar 13 10:57:17 2019 daemon.err uhttpd[1418]: (tail call): ?
Wed Mar 13 10:57:17 2019 daemon.err uhttpd[1418]: /usr/lib/lua/luci/tools/status.lua:185: in function 'wifi_network'
Wed Mar 13 10:57:17 2019 daemon.err uhttpd[1418]: /usr/lib/lua/luci/controller/admin/network.lua:277: in function </usr/lib/lua/luci/controller/admin/network.lua:270>
Wed Mar 13 10:57:17 2019 daemon.err uhttpd[1418]: (tail call): ?
LuCI Master (git-19.058.07653) / OpenWrt SNAPSHOT r9456
This issue has been bugging me for a while since my monitoring system alerts me on errors from the router.
Adding the latest message in my system log as well:
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: The called action terminated with an exception:
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: /usr/lib/lua/luci/template.lua:97: Failed to execute template 'admin_status/index'.
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: A runtime error occured: /usr/lib/lua/luci/model/network.lua:203: attempt to index a nil value
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: stack traceback:
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: /usr/lib/lua/luci/model/network.lua:203: in function '_wifi_iwinfo_by_ifname'
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: /usr/lib/lua/luci/model/network.lua:1657: in function '__init__'
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: /usr/lib/lua/luci/util.lua:65: in function 'wifinet'
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: /usr/lib/lua/luci/model/network.lua:1591: in function 'callback'
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: /usr/lib/lua/luci/model/uci.lua:269: in function 'foreach'
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: /usr/lib/lua/luci/model/network.lua:1588: in function 'get_wifinets'
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: /usr/lib/lua/luci/tools/status.lua:147: in function 'wifi_networks'
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: [string "/usr/lib/lua/luci/view/admin_status/index.h..."]:50: in main chunk
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: stack traceback:
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: [C]: in function 'error'
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: /usr/lib/lua/luci/template.lua:97: in function </usr/lib/lua/luci/template.lua:85>
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: (tail call): ?
Wed Mar 20 20:12:55 2019 daemon.err uhttpd[1617]: /usr/lib/lua/luci/dispatcher.lua:827: in function </usr/lib/lua/luci/dispatcher.lua:826>
Linksys WRT1900AC OpenWrt 18.06.1 r7258-5eb055306f / LuCI openwrt-18.06 branch (git-19.051.55698-76cf653)
I just ran into this issue while running OpenWrt SNAPSHOT r9019-0e8d5ff0fc / LuCI Master (git-19.079.18950-11e64f8) on MikroTik RouterBOARD 750Gr3.
Mon May 20 20:21:39 2019 daemon.err uhttpd[2018]: Failed to execute call dispatcher target for entry '/admin/network/iface_status/lan,vpn,wan'.
Mon May 20 20:21:39 2019 daemon.err uhttpd[2018]: The called action terminated with an exception:
Mon May 20 20:21:39 2019 daemon.err uhttpd[2018]: /usr/lib/lua/luci/controller/admin/network.lua:215: attempt to call method 'is_auto' (a nil value)
Mon May 20 20:21:39 2019 daemon.err uhttpd[2018]: stack traceback:
Mon May 20 20:21:39 2019 daemon.err uhttpd[2018]: /usr/lib/lua/luci/controller/admin/network.lua:215: in function </usr/lib/lua/luci/controller/admin/network.lua:187>
Mon May 20 20:21:39 2019 daemon.err uhttpd[2018]: (tail call): ?
Master / 19.07 do not use this code anymore
For whatever reason I am unable to re-open : https://github.com/openwrt/luci/issues/1649 I am again running a recent git head build : commit 2a598bbaa3f75b7051c2453a6ccf706191cf2153 Author: Rafał Miłecki rafal@milecki.pl Date: Fri Jul 27 21:54:08 2018 +0200
I have tracked this down to luci.ip returning nil for the the ipc.link(ifname) call.
I inserted debugging code in network.lua prior to the failing line (local phy_only = force_phy_only or (ipc.link(ifname).type ~= 1)) :
And got : Tue Jul 31 18:59:46 2018 daemon.err uhttpd[2337]: ipc.link(ifname) is nil wlan0
The subsequent call (ipc.link(ifname).type ~= 1) then succeeded.
So for whatever reason on this device, very intermittently (like once a day) that call will fail and return nil, which network.lua then attempts to index with the .type call and it fails.
I attached the requested network interface dump to that report, but as I've been unable to re-open it I'll reference it here : https://github.com/openwrt/luci/files/1897666/network.interface.dump.txt
I'm now not sure where to go next. Anyone have any ideas?