smart-edge-open / converged-edge-experience-kits

Source code for experience kits with Ansible-based deployment.
Apache License 2.0
37 stars 40 forks source link

Unable to start edge nodes (e.g. after a reboot) #8

Closed idirect-dev closed 4 years ago

idirect-dev commented 4 years ago

I can't seem to find instructions on how to start edge nodes if they are not running. I have deployed controller and edge nodes via the onpremise deployment scripts. I have managed to enroll a number of nodes with the controller (to the point where I can edit the network interfaces. However if the edge node services stop running for some reason (including a reboot), any attempt to list the interfaces results in a 500 error from the GUI - but doesn't give any clue as to how address the issue. If I look at the edge node I can see that no docker services are running, I cannot find any documentation or scripts to get it running. For the controller at least, I see I can use make all-up from the /opt/openness directory.

I'm not sure how to go about debugging/resolving this issue unfortunately.

idirect-dev commented 4 years ago

Some further information: I seem to get some services running by issuing docker-compose start (from /opt/edgenode/)

[root@mec-nomadic edgenode]# docker-compose start WARNING: The VER variable is not set. Defaulting to a blank string. WARNING: The REMOTE_SYSLOG_IP variable is not set. Defaulting to a blank string. WARNING: The NTS_MEM_MB_S0 variable is not set. Defaulting to a blank string. WARNING: The NTS_MEM_MB_S1 variable is not set. Defaulting to a blank string. WARNING: The OVS_BRIDGE_NAME variable is not set. Defaulting to a blank string. WARNING: The OVSE variable is not set. Defaulting to a blank string. Starting interfaceservice ... failed Starting eaa ... done Starting edgednssvr ... error Starting nts ... done Starting appliance ... done Starting syslog-ng ... failed

ERROR: for edgednssvr Cannot start service edgednssvr: driver failed programming external connectivity on endpoint mec-app-edgednssvr (087f664f46826db2554e5215aa78b508bc6d0edef6a404be92aaf395a86a5ca2): listen udp 192.168.122.128:53: bind: cannot assign requested address [root@mec-nomadic edgenode]#

This however only seems to get appliance:1.0 and eea1:0 - running nts starts, but stops shortly afterwards.

I am unsure how to proceed.

anurag-ness commented 4 years ago

Hello idirect-dev,

looks like some container did not start - I have found manual restarts to work. Worth trying if you have not already tried. Some steps below (I managed to find from an old logs - note these are only for reference and in your version it may be different):

You can find the containers using (sudo privileges needed), on the edgenode: docker ps -a | grep Exited

....
87f2cab62c74        appliance:1.0              "./entrypoint.sh"        2 days ago          Exited (128) 9 minutes ago                       edgenode_appliance_1
c7980841738d        nts:1.0                    "/root/entrypoint.sh"    2 days ago          Exited (143) 4 hours ago                         nts
afd8186c1d7e        edgednssvr:1.0             "/bin/sh -c './edged…"   2 days ago          Exited (0) 4 hours ago                           mec-app-edgednssvr
f9b6349a6406        balabit/syslog-ng:3.19.1   "/usr/sbin/syslog-ng…"   2 days ago          Exited (0) 9 minutes ago                         edgenode_syslog-ng_1

You can restart from the container id (first column) : docker restart c7980841738d 87f2cab62c74 afd8186c1d7e f9b6349a6406

Sometimes due to dependencies if the sequence of restarts is out of order then you may have to rerun for those that didn't start correctly.