prplfoundation / prplMesh

This repository moved to https://gitlab.com/prpl-foundation/prplmesh/prplMesh
Other
65 stars 31 forks source link

[BUG] Certification: prplmesh fails to start after some time #1470

Closed rmelotte closed 4 years ago

rmelotte commented 4 years ago

See for example this job: https://gitlab.com/prpl-foundation/prplMesh/-/jobs/604999730

It's also happening in the "nightly" tests from today (https://gitlab.com/prpl-foundation/prplMesh/-/jobs/606694852). The first tests were fine, then from MAP-4.6.2_ETH_FH5GH onward all the tests started failing like this:

2020-06-23 03:51:58.098 - ERROR - TCP client socket error - [Errno 10061] Connection refused

In other words, prplMesh is not listening on the ucc port.

This is the stdout content when starting prplMesh:

/opt/prplmesh/scripts/prplmesh_utils.sh: start
prplmesh_framework_init - starting local_bus and ieee1905_transport processes...
/opt/prplmesh/scripts/prplmesh_utils.sh: line 97: ebtables: not found
prplmesh_agent_start - start beerocks_agent process...
json_object_from_file: error opening file /tmp/share/logging.conf: No such file or directory
10:56:06 logger.cpp[144]: Configuration file does not exist
json_object_from_file: error opening file /tmp/share/logging.conf: No such file or directory
10:56:06 logger.cpp[144]: Configuration file does not exist
2020-06-14 10:56:06,470 DEBUG [default] [root@unknown-host] [int beerocks::bpl::cfg_get_all_prplmesh_wifi_interfaces(beerocks::bpl::BPL_WLAN_IFACE*, int*)] [prplmesh-1.4.0-86cbf8de/framework/platform/bpl/uci/cfg/bpl_cfg.cpp:393] cfg_get_all_prplmesh_wifi_interfaces: failed to get wifi interface for radio2 or radio2.hostap_iface doesn't exist
2020-06-14 10:56:06,471 DEBUG [default] [root@unknown-host] [int main(int, char**)] [prplmesh-1.4.0-86cbf8de/agent/src/beerocks/slave/beerocks_slave_main.cpp:513] radio0.hostap_iface=wlan0
2020-06-14 10:56:06,472 DEBUG [default] [root@unknown-host] [int main(int, char**)] [prplmesh-1.4.0-86cbf8de/agent/src/beerocks/slave/beerocks_slave_main.cpp:513] radio1.hostap_iface=wlan2
2020-06-14 10:56:06,473 DEBUG [default] [root@unknown-host] [static void beerocks::os_utils::kill_pid(const string&, const string&)] [prplmesh-1.4.0-86cbf8de/common/beerocks/bcl/source/beerocks_os_utils.cpp:94] kill_pid SIGTERM pid=9325

The kill_pid SIGTERM pid=9325 part seem to indicate another prplmesh process is still running when we call /opt/prplmesh/scripts/prplmesh_utils.sh: start.

When a process cannot be stopped with SIGTERM, beerocks::os_utils::kill_pid is supposed to wait for 15 seconds, then send it a SIGKILL. In this case however, it doesn't happen and we never reach the point where the slaves are started.

arnout commented 4 years ago

After analysis, spliti into two issues: #1472 and #1473.