motioneye-project / motioneyeos

A Video Surveillance OS For Single-board Computers
Other
7.9k stars 904 forks source link

MotionEyeOS doesn't start after a power cycle if network is down #2258

Closed lmarceg closed 4 years ago

lmarceg commented 4 years ago

Hi, I have a RaspBerry Pi3b running apcupsd (I will call it RB) and another Raspberry Pi3b running MotionEyeOS (I will call it ME) In the doshutdown script of the apcupsd script, I ssh the ME, stop the MotionEye server and poweroff; I also shutdown RB and kill the power to the USB. When power comes up, my RB power-cycles fine and I can SSH into it; while my ME has the red light on, the amber light flashing (so it went up, in a way), but I cannot SSH into it and the server does not respond. If I unplug and replug the power again, it will start working fine. I see two boot entries (the failed one and the right one), which are copied here:

---- booting motionEyeOS dev20190820 ----
 * Detecting disk device: /dev/mmcblk0
 * Mounting filesystems: done
 * Loading kernel modules: done
 * Setting hostname: done
 * Configuring CPU frequency: done
 * Starting syslogd: done
 * Starting throttle watcher: done
 * Starting eudev: done
 * Starting watchdog: done
 * Starting rngd: done
 * Configuring wired network: no link
 * Setting current date using http: failed
 * Starting http date updater: done
 * Starting crond: done
 * Starting sshd: done
 * Starting proftpd: done
 * Starting smbd: done
 * Starting nmbd: done
 * Starting motioneye: done
 * Executing user init script: bin boot data dev etc home lib lib32 libexec linuxrc lost+found media mnt opt proc root run sbin sys tmp usr var
bin boot data dev etc home lib lib32 libexec linuxrc lost+found media mnt opt proc root run sbin sys tmp usr var
done
---- booting motionEyeOS dev20190820 ----
 * Detecting disk device: /dev/mmcblk0
 * Mounting filesystems: done
 * Loading kernel modules: done
 * Setting hostname: done
 * Configuring CPU frequency: done
 * Starting syslogd: done
 * Starting throttle watcher: done
 * Starting eudev: done
 * Starting watchdog: done
 * Starting rngd: done
 * Configuring wired network: dhcp
 * Setting current date using http: Mon Jan 13 12:24:47 CET 2020
 * Starting http date updater: done
 * Starting crond: done
 * Starting sshd: done
 * Starting proftpd: done
 * Starting smbd: done
 * Starting nmbd: done
 * Starting motioneye: done
 * Executing user init script: bin boot data dev etc home lib lib32 libexec linuxrc lost+found media mnt opt proc root run sbin sys tmp usr var
bin boot data dev etc home lib lib32 libexec linuxrc lost+found media mnt opt proc root run sbin sys tmp usr var
done
 # Interface eth0 has IP address 192.168.1.132/24
 # Default gateway is 192.168.1.1
 # DNS server address is 213.205.32.70

As you see, in the first boot attempt ME has issues in getting an IP address. This is correct because my router is still coming up; in the second attempt, everything is OK and I can access the system.

Can this be the issue? I am using DHCP and I am telling the router to assign a specific IP Address, but maybe if I configure a static IP Address on the ME, this could solve the issue?

Thanks for helping!

ccrisan commented 4 years ago

@lmarceg motionEyeOS is supposed to reboot if it fails to obtain a network connection, unless OS_NETWORKLESS is true. Have you tweaked your /data/etc/os.conf?

lmarceg commented 4 years ago

Yes @ccrisan I have tweaked it because if, for any circumstances, my Internet goes down because my ISP has some issues, the Raspberry starts rebooting and this is not what I want, as the panic reboot will not make my Internet come back. I mean, I do not believe a reboot is a way to solve such network problems. But what happens if I configure a static IP Address? Will I overcome the issue? Or can I launch some scripts after boot, just waiting for an IP address? Without having to reboot it... Anyway I was checking the /data/etc/os.conf and it is like that

OS_DEBUG="false"
OS_PRERELEASES="false"
OS_TTY_LOGIN="tty1"
OS_ETH="eth0"
OS_WLAN="wlan0"
OS_PPP="ppp0"
OS_NETWORKLESS="false"
OS_COUNTRY="GB"
OS_FIRMWARE_METHOD="github"
OS_FIRMWARE_REPO="ccrisan/motioneyeos"
OS_FIRMWARE_USERNAME=""
OS_FIRMWARE_PASSWORD=""

What I changed is the /data/etc/watch.conf

LINK_WATCH="false"
LINK_WATCH_TIMEOUT=20

IP_WATCH="false"
IP_WATCH_TIMEOUT=20

#NETWATCH_HOST=www.google.com
NETWATCH_PORT=80
NETWATCH_RETRIES=3
NETWATCH_TIMEOUT=5
NETWATCH_INTERVAL=20
ccrisan commented 4 years ago

I mean, I do not believe a reboot is a way to solve such network problems.

Nope, it is not, but neither is setting OS_NETWORKLESS=true. Internet connectivity is different than having no network connection. The former is something that motionEyeOS can live (and boot) with, while the latter is generally a no-go.

Internet connection is needed mainly for setting a correct date/time, which should be taken care of by ntp as soon as the Internet connection is back, so no need to reboot.

Local network connection is very unlikely to be restored by itself without a reboot (to reset possible misbehaving drivers) or a hardware intervention.

But what happens if I configure a static IP Address? Will I overcome the issue?

Configuring static IP probably won't fix your problem, unless your local DHCP server is somehow down while your Internet connection is down, which should rarely be the case.

lmarceg commented 4 years ago

Thanks for your comments. Based on what you wrote and by deeply looking at /etc/init.d/S40network script which is invoked during boot, I see that if I set OS_NETWORKLESS=true, this script will exit right after setting the loopback and the hostname, and no other interface will be brought up. This is not what I want. I see that the script exits whenever the link is down, and this is my issue: during boot and when power has just come again, my router is booting too but it's slower than the Raspberry and it's not giving it any connectivity yet.

So I have edited the script (only for the ethernet part) which now looks like that:

start_eth() {
    msg_begin "Configuring wired network"

    # wait for driver
    w=3
    count=0
    while ! ifconfig ${OS_ETH} >/dev/null 2>&1; do
        sleep 1
        count=$((${count} + 1))
        if [[ ${count} -ge ${w} ]]; then
            msg_done "no device"
            return 1
        fi
    done

    # bring it up
    ifconfig ${OS_ETH} up

#    # wait for operstate
#    w=3
#    count=0
#    while [[ "$(cat /sys/class/net/${OS_ETH}/operstate 2>&1)" == "unknown" ]]; do
#        sleep 1
#        count=$((${count} + 1))
#        if [[ ${count} -ge ${w} ]]; then
#            msg_done "no link"
#            return 1
#        fi
#    done

#    # wait for link
#    test "${LINK_WATCH}" == "true" || LINK_NEGO_TIMEOUT=5
#    count=0
#    while [[ "$(cat /sys/class/net/${OS_ETH}/carrier 2>&1)" != "1" ]]; do
#        sleep 1
#        count=$((${count} + 1))
#        if [[ ${count} -ge ${LINK_NEGO_TIMEOUT} ]]; then
#            msg_done "no link"
#            return 1
#        fi
#    done

    if [[ -n "${mtu}" ]]; then
        ip link set mtu ${mtu} dev ${OS_ETH}
    fi

    if [[ -n "${STATIC_IP}" ]]; then
        msg_done ${STATIC_IP}
        ifconfig ${OS_ETH} ${STATIC_IP} up
        STATIC_IP="" # won't be used again
    else
        msg_done dhcp
        dhclient -cf "${DH_CONF}" ${OS_ETH}
    fi

    if [[ "${LINK_WATCH}" == "true" ]]; then
        watch_eth &
    fi

    if [[ "${IP_WATCH}" == "true" ]] && ip addr show dev ${OS_ETH} | grep inet &>/dev/null; then
        watch_ip ${OS_ETH} &
    fi
}

As you see, I check if I can bring eth0 up (if not, there must an HW problem somewhere), but then I do not care if the link is down (because most probably it is, and anyway I need to continue). I have configured a static IP address so that the script will not rely on DHCP: AFAIK, dhclient has a timeout of 60 seconds and I am not sure this is enough, so better to have a static configuration which will always work. Maybe one could play with a different timeout.

I have then rebooted MotionEyeOS and also the router. And when the LAN came up again, a couple of minutes after, I could connect to the MotionEyeOS WebServer without any issues! Tah-dah!

I am not sure there is a smarter way, but I have solved all my issues. From my point of view, we can close the case unless there is a smarter way to achieve what I did manually (i.e. without changing the script but only playing with configurations). Thanks Luca

ccrisan commented 4 years ago

@lmarceg as I said before and as you have found out yourself as well, OS_NETWORKLESS=true is not a solution for you.

Nevertheless I still fail to understand your problem. Having a router or network equipment that boots slower than a motionEyeOS RPi is quite normal and expected. In such a case there are the following possible scenarios:

  1. Your RPi boots and cannot find your WiFi network (if you're on WiFi). It will reboot until it finds it and can connect to it.
  2. Your RPi boots and has a wired connection but can't obtain a configuration via DHCP, since your router hasn't fully started. It will reboot until it can obtain IP configuration via DHCP.
  3. Your RPi boots and connects to either WiFi or wired, successfully obtaining an IP configuration. It can't however access the Internet due to a lazy router. It will start doing its work with a wrong date (year 1970) and will continuously try to adjust date/time, using a hopefully available Internet connection.

Is there a 4-th scenario that I missed. Or do you think that the current behavior in one of these senarios is not good for you?

lmarceg commented 4 years ago

@ccrisan I am on a wired network and therefore point 2 could potentially apply, but I find quite inconvenient to have to boot just because a link is down. How many time will the raspberry have to boot before this link goes up? The entire day, if I am unlucky? From my point of view, there is simply no point. Moreover, the panic reboot will take place each time a link is down (because script 41something is taking care of checking the link status and IP addresses continuously). In order to avoid this script to reboot my Raspberry, I commented out all the watch.conf, and this is impacting script 40 (so your suggestion cannot be applied anymore to my case). With my simple workaround I don't have to reboot and everything is still working fine. There are no other scenarios, and I agree it is just a matter of points of view.

ccrisan commented 4 years ago

@lmarceg

but I find quite inconvenient to have to boot just because a link is down

Why?

How many time will the raspberry have to boot before this link goes up? The entire day, if I am unlucky?

Many times, but the reboot delay is gradually increased to 1 hour. So in your case, if network is not available for, say, 2 minutes, it'll probably reboot a couple of times.

Moreover, the panic reboot will take place each time a link is down

Yes, and in your case the link may go down due to an external problem, but there are cases where the ethernet/USB controller needs to be reset. I experienced it entering a weird hung state due to voltage issues and there's no other way of solving it than resetting the board.

I commented out all the watch.conf, and this is impacting script 40

You can just adjust corresponding variables in your watch.conf file if you want to get rid of this functionality. A firmware update would overwrite your changes to S40network.

While it may first seem odd, rebooting a Raspberry Pi board upon various types of failures is not bad:

lmarceg commented 4 years ago

Thanks @ccrisan for your comments. I will think about it and may reconsider my position.