syncloud / platform

Run popular services on your device with one click
https://syncloud.org
GNU General Public License v3.0
400 stars 40 forks source link

Syncloud platform and apps down #428

Closed chegeiser closed 5 years ago

chegeiser commented 5 years ago

I ran a restart from the platform restart option (Shutdown>Restart). When it restarted I couldn't get in to the system, not even into the main syncloud admin interface - there's no webserver error so it seems the webserver is down. I went in through ssh and shutdown, pulled the power for a few minutes, then rebooted. No luck getting in.

I've run systemctl status snap.platform.* and got this:

● snap.platform.uwsgi-api.service - Service for snap application platform.uwsgi-api
   Loaded: loaded (/etc/systemd/system/snap.platform.uwsgi-api.service; enabled)
   Active: inactive (dead)

Jan 01 00:04:42 odroid-n2 systemd[1]: Dependency failed for Service for snap application platform.uwsgi-api.

● snap.platform.nginx-api.service - Service for snap application platform.nginx-api
   Loaded: loaded (/etc/systemd/system/snap.platform.nginx-api.service; enabled)
   Active: inactive (dead)

Jan 01 00:04:42 odroid-n2 systemd[1]: Dependency failed for Service for snap application platform.nginx-api.

● snap.platform.nginx-public.service - Service for snap application platform.nginx-public
   Loaded: loaded (/etc/systemd/system/snap.platform.nginx-public.service; enabled)
   Active: inactive (dead)

Jan 01 00:04:42 odroid-n2 systemd[1]: Dependency failed for Service for snap application platform.nginx-public.

● snap.platform.backend.service - Service for snap application platform.backend
   Loaded: loaded (/etc/systemd/system/snap.platform.backend.service; enabled)
   Active: inactive (dead)

Jan 01 00:04:42 odroid-n2 systemd[1]: Dependency failed for Service for snap application platform.backend.

● snap.platform.nginx-internal.service - Service for snap application platform.nginx-internal
   Loaded: loaded (/etc/systemd/system/snap.platform.nginx-internal.service; enabled)
   Active: inactive (dead)

Jan 01 00:04:42 odroid-n2 systemd[1]: Dependency failed for Service for snap application platform.nginx-internal.

● snap.platform.uwsgi-internal.service - Service for snap application platform.uwsgi-internal
   Loaded: loaded (/etc/systemd/system/snap.platform.uwsgi-internal.service; enabled)
   Active: inactive (dead)

Jan 01 00:04:42 odroid-n2 systemd[1]: Dependency failed for Service for snap application platform.uwsgi-internal.

● snap.platform.openldap.service - Service for snap application platform.openldap
   Loaded: loaded (/etc/systemd/system/snap.platform.openldap.service; enabled)
   Active: inactive (dead)

Jan 01 00:04:42 odroid-n2 systemd[1]: Dependency failed for Service for snap application platform.openldap.

● snap.platform.uwsgi-public.service - Service for snap application platform.uwsgi-public
   Loaded: loaded (/etc/systemd/system/snap.platform.uwsgi-public.service; enabled)
   Active: inactive (dead)

I tried systemctl status and gut this

<E2><97><8F> odroid-n2
    State: degraded
     Jobs: 0 queued
   Failed: 9 units
    Since: Thu 1970-01-01 00:00:10 UTC; 49 years 9 months ago
   CGroup: /
           <E2><94><9C><E2><94><80>1 /sbin/init
           <E2><94><94><E2><94><80>system.slice
             <E2><94><9C><E2><94><80>snap-syncthing-190629147.mount
             <E2><94><82> <E2><94><94><E2><94><80>2060 /bin/mount -n /var/lib/snapd/snaps/syncthing_190629147.snap /snap/syncthing/190629147 -t squashfs -o nodev,ro,x-gdu.hide
             <E2><94><9C><E2><94><80>snap-platform-19092245.mount
             <E2><94><82> <E2><94><94><E2><94><80>2047 /bin/mount -n /var/lib/snapd/snaps/platform_19092245.snap /snap/platform/19092245 -t squashfs -o nodev,ro,x-gdu.hide
             <E2><94><9C><E2><94><80>snapd.service
             <E2><94><82> <E2><94><94><E2><94><80>2174 /usr/lib/snapd/snapd
             <E2><94><9C><E2><94><80>dbus.service
             <E2><94><82> <E2><94><94><E2><94><80>2165 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
             <E2><94><9C><E2><94><80>snap-nextcloud-190712444.mount
             <E2><94><82> <E2><94><94><E2><94><80>2037 /bin/mount -n /var/lib/snapd/snaps/nextcloud_190712444.snap /snap/nextcloud/190712444 -t squashfs -o nodev,ro,x-gdu.hide
             <E2><94><9C><E2><94><80>snap-syncthing-190625132.mount
             <E2><94><82> <E2><94><94><E2><94><80>2042 /bin/mount -n /var/lib/snapd/snaps/syncthing_190625132.snap /snap/syncthing/190625132 -t squashfs -o nodev,ro,x-gdu.hide
             <E2><94><9C><E2><94><80>snap-platform-1907121131.mount
             <E2><94><82> <E2><94><94><E2><94><80>2025 /bin/mount -n /var/lib/snapd/snaps/platform_1907121131.snap /snap/platform/1907121131 -t squashfs -o nodev,ro,x-gdu.hide
             <E2><94><9C><E2><94><80>ssh.service
             <E2><94><82> <E2><94><9C><E2><94><80>2176 /usr/sbin/sshd -D
             <E2><94><82> <E2><94><9C><E2><94><80>2316 sshd: root@pts/0
             <E2><94><82> <E2><94><9C><E2><94><80>2318 -bash
             <E2><94><82> <E2><94><9C><E2><94><80>2491 systemctl status
             <E2><94><82> <E2><94><94><E2><94><80>2492 pager
             <E2><94><9C><E2><94><80>avahi-daemon.service
             <E2><94><82> <E2><94><9C><E2><94><80>2162 avahi-daemon: running [odroid-n2.local
             <E2><94><82> <E2><94><94><E2><94><80>2166 avahi-daemon: chroot helpe
             <E2><94><9C><E2><94><80>system-serial\x2dgetty.slice
             <E2><94><82> <E2><94><94><E2><94><80>serial-getty@ttyS0.service
             <E2><94><82>   <E2><94><94><E2><94><80>2190 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0 vt102
             <E2><94><9C><E2><94><80>ntp.service
             <E2><94><82> <E2><94><94><E2><94><80>2167 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 106:111
             <E2><94><9C><E2><94><80>system-getty.slice
             <E2><94><82> <E2><94><94><E2><94><80>getty@tty1.service
             <E2><94><82>   <E2><94><94><E2><94><80>2189 /sbin/agetty --noclear tty1 linux
             <E2><94><9C><E2><94><80>snap-platform-1905051083.mount
             <E2><94><82> <E2><94><94><E2><94><80>2016 /bin/mount -n /var/lib/snapd/snaps/platform_1905051083.snap /snap/platform/1905051083 -t squashfs -o nodev,ro,x-gdu.hide
             <E2><94><9C><E2><94><80>systemd-logind.service
             <E2><94><82> <E2><94><94><E2><94><80>2155 /lib/systemd/systemd-logind
             <E2><94><9C><E2><94><80>snap-notes-19040763.mount
             <E2><94><82> <E2><94><94><E2><94><80>2062 /bin/mount -n /var/lib/snapd/snaps/notes_19040763.snap /snap/notes/19040763 -t squashfs -o nodev,ro,x-gdu.hide
             <E2><94><9C><E2><94><80>snap-nextcloud-1909118.mount
             <E2><94><82> <E2><94><94><E2><94><80>2034 /bin/mount -n /var/lib/snapd/snaps/nextcloud_1909118.snap /snap/nextcloud/1909118 -t squashfs -o nodev,ro,x-gdu.hide
             <E2><94><9C><E2><94><80>cron.service

And systemctl gives this

  UNIT                                                                              LOAD   ACTIVE SUB       DESCRIPTION
  proc-sys-fs-binfmt_misc.automount                                                 loaded active waiting   Arbitrary Executable File Formats File System Automount Point
  sys-devices-platform-auge_sound-sound-card0.device                                loaded active plugged   /sys/devices/platform/auge_sound/sound/card0
  sys-devices-platform-ff3f0000.ethernet-net-eth0.device                            loaded active plugged   /sys/devices/platform/ff3f0000.ethernet/net/eth0
  sys-devices-platform-ff500000.dwc3-xhci\x2dhcd.0.auto-usb2-2\x2d1-2\x2d1.3-2\x2d1.3:1.0-host0-target0:0:0-0:0:0:0-block-sda-sda1.device loaded active plugged   BUP_Slim_BK 1
  sys-devices-platform-ff500000.dwc3-xhci\x2dhcd.0.auto-usb2-2\x2d1-2\x2d1.3-2\x2d1.3:1.0-host0-target0:0:0-0:0:0:0-block-sda.device loaded active plugged   BUP_Slim_BK
  sys-devices-platform-ffe07000.emmc-mmc_host-emmc-emmc:0001-block-mmcblk0-mmcblk0boot0.device loaded active plugged   /sys/devices/platform/ffe07000.emmc/mmc_host/emmc/emmc:0001/block/mmcb
  sys-devices-platform-ffe07000.emmc-mmc_host-emmc-emmc:0001-block-mmcblk0-mmcblk0boot1.device loaded active plugged   /sys/devices/platform/ffe07000.emmc/mmc_host/emmc/emmc:0001/block/mmcb
  sys-devices-platform-ffe07000.emmc-mmc_host-emmc-emmc:0001-block-mmcblk0-mmcblk0p1.device loaded active plugged   /sys/devices/platform/ffe07000.emmc/mmc_host/emmc/emmc:0001/block/mmcblk0
  sys-devices-platform-ffe07000.emmc-mmc_host-emmc-emmc:0001-block-mmcblk0-mmcblk0p2.device loaded active plugged   /sys/devices/platform/ffe07000.emmc/mmc_host/emmc/emmc:0001/block/mmcblk0
  sys-devices-platform-ffe07000.emmc-mmc_host-emmc-emmc:0001-block-mmcblk0-mmcblk0rpmb.device loaded active plugged   /sys/devices/platform/ffe07000.emmc/mmc_host/emmc/emmc:0001/block/mmcbl
  sys-devices-platform-ffe07000.emmc-mmc_host-emmc-emmc:0001-block-mmcblk0.device   loaded active plugged   /sys/devices/platform/ffe07000.emmc/mmc_host/emmc/emmc:0001/block/mmcblk0
  sys-devices-platform-soc-ff800000.aobus-ff803000.serial-tty-ttyS0.device          loaded active plugged   /sys/devices/platform/soc/ff800000.aobus/ff803000.serial/tty/ttyS0
  sys-devices-virtual-block-zram0.device                                            loaded active plugged   /sys/devices/virtual/block/zram0
  sys-module-configfs.device                                                        loaded active plugged   /sys/module/configfs
  sys-subsystem-net-devices-eth0.device                                             loaded active plugged   /sys/subsystem/net/devices/eth0
  -.mount                                                                           loaded active mounted   /
  dev-mqueue.mount                                                                  loaded active mounted   POSIX Message Queue File System
  opt-disk-external.mount                                                           loaded active mounted   External disk
<E2><97><8F> snap-nextcloud-190211412.mount                                                    loaded failed failed    Mount unit for nextcloud
<E2><97><8F> snap-nextcloud-190712444.mount                                                    loaded failed failed    Mount unit for nextcloud
<E2><97><8F> snap-nextcloud-1909118.mount                                                      loaded failed failed    Mount unit for nextcloud
<E2><97><8F> snap-notes-19040763.mount                                                         loaded failed failed    Mount unit for notes
<E2><97><8F> snap-platform-1905051083.mount                                                    loaded failed failed    Mount unit for platform
<E2><97><8F> snap-platform-1907121131.mount                                                    loaded failed failed    Mount unit for platform
<E2><97><8F> snap-platform-19092245.mount                                                      loaded failed failed    Mount unit for platform
<E2><97><8F> snap-syncthing-190625132.mount                                                    loaded failed failed    Mount unit for syncthing
<E2><97><8F> snap-syncthing-190629147.mount                                                    loaded failed failed    Mount unit for syncthing
  sys-kernel-config.mount                                                           loaded active mounted   Configuration File System
  sys-kernel-debug.mount                                                            loaded active mounted   Debug File System
  systemd-ask-password-console.path                                                 loaded active waiting   Dispatch Password Requests to Console Directory Watch
  systemd-ask-password-wall.path                                                    loaded active waiting   Forward Password Requests to Wall Directory Watch
  avahi-daemon.service                                                              loaded active running   Avahi mDNS/DNS-SD Stack
  cron.service                                                                      loaded active running   Regular background program processing daemon
  dbus.service                                                                      loaded active running   D-Bus System Message Bus
  getty@tty1.service                                                                loaded active running   Getty on tty1
  kmod-static-nodes.service                                                         loaded active exited    Create list of required static device nodes for the current kernel
  networking.service                                                                loaded active running   LSB: Raise network interfaces.
  ntp.service                                                                       loaded active running   LSB: Start NTP daemon
  rc-local.service                                                                  loaded active exited    /etc/rc.local Compatibility
  rsyslog.service                                                                   loaded active running   System Logging Service
  serial-getty@ttyS0.service                                                        loaded active running   Serial Getty on ttyS0
  snapd.service                                                                     loaded active running   Snappy daemon
  ssh.service                                                                       loaded active running   OpenBSD Secure Shell server
  systemd-journald.service                                                          loaded active running   Journal Service
  systemd-logind.service                                                            loaded active running   Login Service
  systemd-modules-load.service                                                      loaded active exited    Load Kernel Modules
  systemd-random-seed.service                                                       loaded active exited    Load/Save Random Seed
  systemd-remount-fs.service                                                        loaded active exited    Remount Root and Kernel File Systems
lines 1-49

Any ideas how to get this running? This happened before, but I did at least have access to the admin page and was then able to restore the apps. Seems the reboot causes some issue.

Thanks, Che

cyberb commented 5 years ago

What device are you using? Could you run these commands:

journalctl
dmesg
chegeiser commented 5 years ago

Device is an Odroid-N2

Output of the two commands attached.

It looks like there's a large number of login attempts from IP addresses in China (that's not me). Could this be enough to crash the system? Or is there something else going on?

I also notice that the time stamp in the journalctl jumps from Jan 01 to Oct 10 (yesterday). Is this normal? Does the system startup and log Jan 01 and once a particular service is started then the clock/date adjusts? This is at row 1235 in the journactl output file.

dmesg output.txt journalctl output.txt

cyberb commented 5 years ago

Looks like systemd udev has some issies:

242.699946] INFO: task systemd-udevd:2015 blocked for more than 120 seconds.
[  242.701527]       Not tainted 4.9.162-22 #1
[  242.706281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.714189] systemd-udevd   D    0  2015   2007 0x00400001
[  242.719463] Call trace:
[  242.722383] [<ffffff80090867ec>] __switch_to+0x9c/0xc0
[  242.727465] [<ffffff8009bfa6a4>] __schedule+0x284/0x7e0
[  242.732781] [<ffffff8009bfac40>] schedule+0x40/0xa8
[  242.737774] [<ffffff8009bfb0d0>] schedule_preempt_disabled+0x28/0x40
[  242.744267] [<ffffff8009bfcc50>] __mutex_lock_slowpath+0x108/0x1e8
[  242.750593] [<ffffff8009bfcd90>] mutex_lock+0x60/0x78
[  242.755790] [<ffffff80095b233c>] lo_release+0x2c/0xd8
[  242.760986] [<ffffff800927fb2c>] __blkdev_put+0x234/0x270
[  242.766530] [<ffffff800927ffbc>] blkdev_put+0x54/0x148
[  242.771818] [<ffffff80092800dc>] blkdev_close+0x2c/0x40
[  242.777190] [<ffffff800923c570>] __fput+0xa8/0x1e8
[  242.782129] [<ffffff800923c728>] ____fput+0x20/0x30
[  242.787158] [<ffffff80090cb260>] task_work_run+0xd0/0x100
[  242.792704] [<ffffff800908ae14>] do_notify_resume+0xb4/0xc0
[  242.798424] [<ffffff8009083878>] work_pending+0x8/0x10
cyberb commented 5 years ago

Yes, it prevents apps from mounting:

Jan 01 00:01:41 odroid-n2 systemd[1]: snap-platform-1905051083.mount mounting timed out. Stopping.
Jan 01 00:01:42 odroid-n2 systemd[1]: snap-syncthing-190629147.mount mounting timed out. Stopping.
Jan 01 00:01:42 odroid-n2 systemd[1]: snap-syncthing-190625132.mount mounting timed out. Stopping.
Jan 01 00:01:42 odroid-n2 systemd[1]: snap-nextcloud-1909118.mount mounting timed out. Stopping.
Jan 01 00:01:42 odroid-n2 systemd[1]: snap-notes-19040763.mount mounting timed out. Stopping.
Jan 01 00:01:42 odroid-n2 systemd[1]: snap-nextcloud-190712444.mount mounting timed out. Stopping.
Jan 01 00:01:42 odroid-n2 systemd[1]: snap-platform-19092245.mount mounting timed out. Stopping.
Jan 01 00:01:42 odroid-n2 systemd[1]: snap-platform-1907121131.mount mounting timed out. Stopping.
Jan 01 00:01:42 odroid-n2 systemd[1]: snap-nextcloud-190211412.mount mounting timed out. Stopping.
Jan 01 00:03:12 odroid-n2 systemd[1]: snap-platform-1905051083.mount mounting timed out. Killing.
Jan 01 00:03:12 odroid-n2 systemd[1]: snap-syncthing-190629147.mount mounting timed out. Killing.
Jan 01 00:03:12 odroid-n2 systemd[1]: snap-syncthing-190625132.mount mounting timed out. Killing.
Jan 01 00:03:12 odroid-n2 systemd[1]: snap-nextcloud-1909118.mount mounting timed out. Killing.
Jan 01 00:03:12 odroid-n2 systemd[1]: snap-notes-19040763.mount mounting timed out. Killing.
Jan 01 00:03:12 odroid-n2 systemd[1]: snap-nextcloud-190712444.mount mounting timed out. Killing.
Jan 01 00:03:12 odroid-n2 systemd[1]: snap-platform-19092245.mount mounting timed out. Killing.
Jan 01 00:03:12 odroid-n2 systemd[1]: snap-platform-1907121131.mount mounting timed out. Killing.
Jan 01 00:03:12 odroid-n2 systemd[1]: snap-nextcloud-190211412.mount mounting timed out. Killing.
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: worker [2015] /devices/virtual/block/loop6 timeout; kill it
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: seq 3017 '/devices/virtual/block/loop6' killed
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: worker [2018] /devices/virtual/block/loop7 timeout; kill it
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: seq 3018 '/devices/virtual/block/loop7' killed
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: worker [2022] /devices/virtual/block/loop4 timeout; kill it
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: seq 3015 '/devices/virtual/block/loop4' killed
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: worker [2023] /devices/virtual/block/loop0 timeout; kill it
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: seq 3011 '/devices/virtual/block/loop0' killed
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: worker [2028] /devices/virtual/block/loop1 timeout; kill it
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: seq 3012 '/devices/virtual/block/loop1' killed
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: worker [2029] /devices/virtual/block/loop5 timeout; kill it
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: seq 3016 '/devices/virtual/block/loop5' killed
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: worker [2030] /devices/virtual/block/loop2 timeout; kill it
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: seq 3013 '/devices/virtual/block/loop2' killed
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: worker [2031] /devices/virtual/block/loop3 timeout; kill it
Jan 01 00:03:12 odroid-n2 systemd-udevd[2007]: seq 3014 '/devices/virtual/block/loop3' killed
Jan 01 00:04:02 odroid-n2 kernel: INFO: task systemd-udevd:2015 blocked for more than 120 seconds.
Jan 01 00:04:02 odroid-n2 kernel:       Not tainted 4.9.162-22 #1
Jan 01 00:04:02 odroid-n2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 01 00:04:02 odroid-n2 kernel: systemd-udevd   D    0  2015   2007 0x00400001
cyberb commented 5 years ago

Did you power off/on the device in the end?

cyberb commented 5 years ago

Looks like not all the devices support software reboot.

cyberb commented 5 years ago

Sorry you actually powered it off. Strange, not sure how can it stuck like this unless sd card (or other type of memory it uses) corrupted.

cyberb commented 5 years ago

I am actually OK to remove shutdown/reboot buttons as I do not really know how to handle it properly on vatious devices and this situation you are showing is really bad to be in.

chegeiser commented 5 years ago

So the only way to shutdown would be ssh, correct?

Any ideas of fixes? Or is this really going to require a fresh image?

chegeiser commented 5 years ago

I went ahead and re-imaged, and then restored from the backup created through Syncloud Settings. Everything is back as it was. Going forward I'll be sure to use SSH to shutdown and/or reboot instead of the built-in Shutdown/Reboot.

cyberb commented 5 years ago

Syncloud runs these commands on the device:

Reboot:

shutdown -r now

Shutdown:

shutdown now

https://github.com/syncloud/platform/blob/master/src/syncloud_platform/control/power.py

I do not think running them with ssh makes any difference.

chegeiser commented 5 years ago

shutdown -r now works from ssh terminal. Apps start up properly on reboot. Is this going to be in the next update whenever it comes out?

Thank you!

cyberb commented 5 years ago

This is what we always had. Can you try to do the same from the UI?

chegeiser commented 5 years ago

Tried via the UI and it worked. Don’t know what caused the original problem. May have been power loss. Will get a UPS set up to avoid issues there.