pducharme / UniFi-Video-Controller

Docker for Unifi-Video Controller (Ubiquiti Networks)
199 stars 105 forks source link

EMS segfault on startup after updating to latest version of docker image #56

Closed evil-dog closed 6 years ago

evil-dog commented 6 years ago

I am seeing a problem after updating to the latest version of the docker image from 2018-02-23.

I posted on the unifi-video forum about it and it seems a few others are having a similar issue and are also running your docker image.

From the logs ems seems to be dying pretty much immediately on startup with a segfult. 1519448206.754 2018-02-23 23:56:46.754/EST: ERROR ems has quit with rc=139 in ems-service 1519448207.921 2018-02-23 23:56:47.921/EST: ERROR ems has quit with rc=139 in ems-service

This makes unifi-video unable to connect to the cameras and fully function properly.

There is a post with full logs from one of the other users on that forum topic.

i8beef commented 6 years ago

Also seeing this. I tried rebooting cameras, rebuilding the container, resetting cameras, unmanaging, etc. Right now I can't see my cameras or add them back at all.

evil-dog commented 6 years ago

For now I ended up going back to the previous build. You'll need to clone this repository, checkout the previous version and build your own image, but it will atleast get your cameras working again.

kenperkins commented 6 years ago

Are you saying the older image wasn't tagged on Docker hub? Also seeing this this morning.

i8beef commented 6 years ago

Yes, there's only "latest" and "beta" right now. I put in a separate ticket a few minutes ago asking for the full version tagging, hopefully we can get that for next time ;-) See Issue #3

I'll be around today if you need logs, etc. Below is an excerpt from my NVR log:

1519494946.190 2018-02-24 12:55:46.190/EST: ERROR Cannot send EMS CLI Request {"command":"listConfig","parameters":{"_messageId":1173}}: Unable to establish websocket connection to send message in EmsCliApi-Executor 1519494946.190 2018-02-24 12:55:46.190/EST: ERROR HouseKeepingTask Error: Timeout executing request: 1173 in StreamManagementService-HouseKeeper

This seems slightly different than @evil-dog's log, but it looks like a pissed off EMS client as well. The only difference my container has to the default command currently published is I apparently don;t specify a user and group.

I am running this container on a Synology NAS if that matters.

fryfrog commented 6 years ago

Unfortunately, neither @pducharme nor myself figured out how to get tagging to work so there aren't any tags. I think all that is needed is to push a tag to github, but haven't ever tried.

I'm seeing this issue too, had to roll back to 3.9.0. Have a test instance spun up to try and troubleshoot, but haven't figured out anything beyond what has already been figured out.

kenperkins commented 6 years ago

Tagging is actually really easy. You just need to tag it when you build. i.e. docker build -t pducharme/UniFi-Video-Controller:1.2.3.4 . when building

fryfrog commented 6 years ago

But docker hub is doing the building? How do you do it that way? For that, I think you just push a tag w/ the hash of the commit needed and docker hub does magic.

fryfrog commented 6 years ago

Oh, I get it. And the directions are in issue #3.

kenperkins commented 6 years ago

Also, there's this (emphasis mine):

https://docs.docker.com/docker-hub/builds/#use-the-build-settings-page

Automated build repositories rely on the integration with your code repository To build. However, you can also push already-built images to these repositories using the docker push command.

pducharme commented 6 years ago

Ok guys. Don't know much how I can solve this. Should I just rebuild the docker "latest" to the 3.9.0 ? cc @fryfrog @Indemnity83 @evil-dog @i8beef @kenperkins

i8beef commented 6 years ago

That has my vote. We have a broken package here, and anyone downloading and trying this are going to hit this. Best to revoke it out before the problem spreads.

I wish I knew more about Docker publishing, I'd help you figure out how to get this tagged up... at the very least, I hope we can get a 3.9.0 tag in there before an attempt is made to move latest to a 3.9.2 build again.

harihoudini commented 6 years ago

Just wanted to let you guys know that it actually works fine for me (3.9.2). No issues. Came here from the Ubnt forums. @fryfrog asked me about my docker run command over there. Running on Unraid. I'm not too savvy on Linux so please don't ask me for technical stuff too much, but if you give me clear instructions on my logs, etc, then i can probably post any information that you may need on a working version.

pducharme commented 6 years ago

Ok. I replace 3.9.2 by 3.9.0. That triggers a build of "latest" on Docker. It should take minutes for it to compile.

I would be OK if someone tell me how I can create a new TAG in Github and have it automatically Builded at Docker Hub (I do NOT build anything with command lines, it's all done when I save a dockerfile in github, So I don't know how it could work...

pducharme commented 6 years ago

@harihoudini You can see your docker run command in Unraid Docker Page. I think in the Logs of the docker (?). I know you can see it if you "Update" a docker, it will show the Run command when it finish update.

harihoudini commented 6 years ago

@pducharme. Yeah i pasted it already for @fryfrog. Here is my run command. your 3.9.2 works like a charm for me...no issues. I don't know why, because I'm not that technical with Linux and dockers: root@localhost:# /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker run -d --name="UniFi-Video" --net="bridge" --privileged="true" -e TZ="America/New_York" -e HOST_OS="unRAID" -p 6666:6666/tcp -p 7080:7080/tcp -p 7442:7442/tcp -p 7443:7443/tcp -p 7445:7445/tcp -p 7446:7446/tcp -p 7447:7447/tcp -v "/mnt/cache/appdata/unifi-video-logs/":"/var/log/unifi-video":rw -v "/mnt/cache/appdata/unifi-video/":"/var/lib/unifi-video":rw -v "/mnt/disks/Unifi_NVR/":"/videos":rw pducharme/unifi-video-controller

6911f631df2d1697971cd05802177c5bd7fa377764ee07c686b0257176cb55a8

i8beef commented 6 years ago

3.9.0 latest back to working. Thank you for that.

My docker command is completely different, more what is in the actual docs here:

docker run -d --name UniFiVideo --cap-add SYS_ADMIN --cap-add DAC_READ_SEARCH --security-opt apparmor:unconfined -p 7442:7442 -p 7443:7443 -p 7444:7444 -p 7445:7445 -p 7446:7446 -p 7447:7447 -p 7080:7080 -p 6666:6666 -p 1935:1935 -v /volume1/docker/UniFiVideo/data:/var/lib/unifi-video -v /volume1/docker/UniFiVideo/videos:/usr/lib/unifi-video/data/videos -v /volume1/docker/UniFiVideo/logs:/var/log/unifi-video -e TZ=America/New_York pducharme/unifi-video-controller:latest

Note for /var/log/unifi-video in 3.9.2, I had to remove the volume and ran into #58. I suspect @harihoudini will find that his logs are no longer updating right now, as that's what I found before removing that mount.

harihoudini commented 6 years ago

@i8beef - When i check the logs from within the app (recording, error, connection etc...they all seem to be updating fine (last log entry was a few minutes ago). Let me know what logs may not be working out of curiosity.

i8beef commented 6 years ago

Yeah, my logs in app looked alright too, but I don't think I saw /mnt/cache/appdata/unifi-video-logs/ updating anymore. In the new version they symlink /var/log/unifi-video inside the container to a location that would be in the /var/lib/unifi-video volume mount. I'm not actually sure who wins in that situation, the volume mount, the symlink, etc? Also it looked like the symlink was created owned by root, which unifi-video (user running the app) can't write through... but all this is tracked more in #58, it's kind of a side issue I think to the EMS issues we were all seeing, so I don't want to derail us with it.

milosivanovic commented 6 years ago

@harihoudini since you say that 3.9.2 works for you, what version of docker are you running? Paste the output of docker version.

I am using the latest:

eclipse ~ # docker version
Client:
 Version:       18.02.0-ce
 API version:   1.36
 Go version:    go1.9.3
 Git commit:    fc4de44
 Built: Sun Feb 11 03:00:53 2018
 OS/Arch:       linux/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:      18.02.0-ce
  API version:  1.36 (minimum version 1.12)
  Go version:   go1.9.3
  Git commit:   fc4de44
  Built:        Sun Feb 11 03:00:18 2018
  OS/Arch:      linux/amd64
  Experimental: false

Yesterday, I tried running the entire container with UID/GUID 0 privileges and --privileged to see if that could be at all related, but I continued running into the same type of crash. So that means your docker run command (which contains --privileged) is likely not the reason why it works for you and not for the rest of us who are using specific --cap-add instead of --privileged.

harihoudini commented 6 years ago

@milosivanovic My docker version:

Client: Version: 17.09.1-ce API version: 1.32 Go version: go1.8.3 Git commit: 19e2cf6 Built: Thu Dec 7 22:21:47 2017 OS/Arch: linux/amd64

Server: Version: 17.09.1-ce API version: 1.32 (minimum version 1.12) Go version: go1.8.3 Git commit: 19e2cf6 Built: Thu Dec 7 22:28:28 2017 OS/Arch: linux/amd64 Experimental: false

fryfrog commented 6 years ago

The only thing from @harihoudini's command that looks much different from mine is --privileged="true", but using that on my own test instance didn't help the ems service from seg faulting. Neither did cleaning up permissions or pointing the log volume at the right place. :/

sudo docker run --rm --name unifi-video-test --privileged=true --cap-add DAC_READ_SEARCH --cap-add SYS_ADMIN -v /data/unifi-video-test/data:/var/lib/unifi-video -v /data/unifi-video-test/videos:/usr/lib/unifi-video/data/videos -v /data/unifi-video-test/logs:/var/log/unifi-video -e PUID=985 -e PGID=985 -e TZ=America/Los_Angeles -e DEBUG=1 pducharme/unifi-video-controller
0 ✓ fryfrog@apollo ~ $ sudo docker version
Client:
 Version:       18.02.0-ce
 API version:   1.36
 Go version:    go1.9.4
 Git commit:    fc4de447b5
 Built: Tue Feb 13 15:28:01 2018
 OS/Arch:       linux/amd64
 Experimental:  false
 Orchestrator:  swarm

Server:
 Engine:
  Version:      18.02.0-ce
  API version:  1.36 (minimum version 1.12)
  Go version:   go1.9.4
  Git commit:   fc4de447b5
  Built:        Tue Feb 13 15:28:34 2018
  OS/Arch:      linux/amd64
  Experimental: false
opticpow commented 6 years ago

Hi All, I was working ok with 3.9.2, and since the downgrade to 3.9.0 (I use watchtower to auto upgrade) my camera's dont connect. I was sure they updated firmware when they went to 3.9.2. My camera's are all G3's

Anyone else seeing this issue? Does anyone have the SHA for the 3.9.2 image?

My Run command:

docker run \
        --name unifi-video \
        --cap-add SYS_ADMIN \
        --cap-add DAC_READ_SEARCH \
        -p 7442:7442 \
        -p 7443:7443 \
        -p 7445:7445 \
        -p 7446:7446 \
        -p 7447:7447 \
        -p 7080:7080 \
        -p 6666:6666 \
        -v /tank/docker/unifi-video/data:/var/lib/unifi-video \
        -v /tank/docker/unifi-video/buffer:/usr/lib/unifi-video/data/videos \
        -v /tank/docker/unifi-video/logs:/var/log/unifi-video \
        -e TZ=Australia/Sydney \
        -e PUID=1001 \
        -e PGID=1001 \
        -e DEBUG=1 \
        pducharme/unifi-video-controller

docker version:

Client:
 Version:      17.06.2-ce
 API version:  1.30
 Go version:   go1.8.3
 Git commit:   cec0b72
 Built:        Tue Sep  5 19:59:06 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.06.2-ce
 API version:  1.30 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   cec0b72
 Built:        Tue Sep  5 20:00:25 2017
 OS/Arch:      linux/amd64
 Experimental: false
fryfrog commented 6 years ago

Yeah, 3.9.2 came w/ a camera firmware update. You can either find a way to run 3.9.2 again or downgrade your cameras.

fryfrog commented 6 years ago

@pducharme: I think you really need to figure out tagging and push 3.9.2 to latest, but have a tag for both 3.9.0 and 3.9.2. That way the people this is working for (how!?) can continue to function as can the ones it isn't working for.

harihoudini commented 6 years ago

@opticpow I guess you can do another manual pull but use this in the wget command to get back to 3.9.2. I'm glad I don't do auto docker updates as my 3.9.2 was working fine ..... wget -q -O unifi-video.deb https://dl.ubnt.com//firmwares/ufv/v3.9.2/unifi-video.Ubuntu16.04_amd64.v3.9.2.deb && \

harihoudini commented 6 years ago

@fryfrog - do you think it has to do with Unraid users using a particular docker version that it seems to work for? Just guessing since @opticpow seems to have a similar docker version to me (17) while others seem to be on 18

opticpow commented 6 years ago

@harihoudini I'm running on CentOS. Docker version just happened to be what ever was installed at the time.

opticpow commented 6 years ago

@harihoudini can you give me the SHA256 code of that image so I can pull it?

harihoudini commented 6 years ago

@opticpow - sorry mate..I just looked at the wget command for the docker and compared it with the direct link URL at ubnt. Neither offer SHA256 code. I have no idea where to get it.

opticpow commented 6 years ago

@harihoudini if you're still running that docker image, this following will give it to you: docker inspect --format='{{index .RepoDigests 0}}' pducharme/unifi-video-controller

harihoudini commented 6 years ago

@opticpow ok here it is:

pducharme/unifi-video-controller@sha256:a88dc2cea20d79af0d8d57a8a796f9d046ede160876c26bbf5ae8e6c32349a99

opticpow commented 6 years ago

Ok, those following along at home who want to "rollback" to the previous docker image with 3.9.2:

Grab the previous image if you have pruged it: (thanks @harihoudini )

docker image pull pducharme/unifi-video-controller@sha256:a88dc2cea20d79af0d8d57a8a796f9d046ede160876c26bbf5ae8e6c32349a99

Now find the image id:

docker image list | grep pducharme/unifi-video-controller
pducharme/unifi-video-controller   latest              7b40e8492969        9 hours ago         807MB
pducharme/unifi-video-controller   <none>              89dbfa082aca        36 hours ago        813MB

Add A new tag for ease of reference:

docker tag 89dbfa082aca pducharme/unifi-video-controller:3.9.2

docker image list | grep pducharme/unifi-video-controller
pducharme/unifi-video-controller   latest              7b40e8492969        9 hours ago         807MB
pducharme/unifi-video-controller   3.9.2               89dbfa082aca        36 hours ago        813MB

Now do a docker run with the new tag reference instead of latest:

docker run \
        --name unifi-video \
        --cap-add SYS_ADMIN \
        --cap-add DAC_READ_SEARCH \
        -p 7442:7442 \
        -p 7443:7443 \
        -p 7445:7445 \
        -p 7446:7446 \
        -p 7447:7447 \
        -p 7080:7080 \
        -p 6666:6666 \
        -v /tank/docker/unifi-video/data:/var/lib/unifi-video \
        -v /tank/docker/unifi-video/buffer:/usr/lib/unifi-video/data/videos \
        -v /tank/docker/unifi-video/logs:/var/log/unifi-video \
        -e TZ=Australia/Sydney \
        -e PUID=1001 \
        -e PGID=1001 \
        -e DEBUG=1 \
        pducharme/unifi-video-controller:3.9.2

Opticpow

fryfrog commented 6 years ago

Has anyone made any progress on troubleshooting this? I can't even find ems binary or how it is run to try and reproduce the error with better output or logging. Nor can I find a core dump or anything. :/

evil-dog commented 6 years ago

@fryfrog The EMS binary is /usr/lib/unifi-video/bin/evostreamms

fryfrog commented 6 years ago

Dang, how did you know that? Do you have any idea how to tell what arguments it launches with?

Edit: Looks like config dir is ./conf/evostream/

fryfrog commented 6 years ago

The only difference in the config looks fine, related to the move from /var/log/unifi-video to /var/lib/unifi-video/

--- /data/unifi-video/data/evostream/config.lua 2018-02-24 06:45:10.653265495 -0800
+++ /data/unifi-video-test/data/evostream/config.lua    2018-02-26 09:26:31.618778115 -0800
@@ -50,7 +50,7 @@
                        name="file appender",
                        type="file",
                        level=6,
-                       fileName="/var/log/unifi-video/evostream",
+                       fileName="/var/lib/unifi-video/logs/evostream",
                        newLineCharacters="\n",
                        fileHistorySize=10,
                        fileLength=1024*1024,
evil-dog commented 6 years ago

One of the posts on the forum thread showed the dmesg log entry for it. Plus while trying to search around online I saw it referenced as EMS in the evostreamms forums.

As for the command line, looks like it just points to config.lua in the dir you already referenced.

You can usually find out args by running ps -Af as root to see the full command line used to execute a program.

fryfrog commented 6 years ago
2018-02-26 18:13:56.760 UTC 0::0::Unable to open file /usr/lib/unifi-video/conf/evostream/pushPullSetup.xml

^ This file is on my 3.9.0, but isn't in my 3.9.2. :/

evil-dog commented 6 years ago

That file is not in the deb file for 3.9.2. So either is was missed in packaging up the deb, or it was removed as not needed.

Can you copy it from the 3.9.0 the the 3.9.2 and see if ems still crashes?

fryfrog commented 6 years ago

Yeah, still crashing. Must be like you said or a file that evostream creates. Still poking around.

fryfrog commented 6 years ago

Here is an strace: https://ptpb.pw/FDgu

And this is the log from an strace'd run: https://ptpb.pw/zS5v

evil-dog commented 6 years ago

Actually, it looks like the pushPullSetup.xml is generated by ems. Looking at my file from my working 3.9.0 setup, there is an entry for each of my cameras.

Plus, I went back and looked at the 3.9.0 deb file, and it's not there either.

fryfrog commented 6 years ago

Dang, I was doing all my testing as root instead of unifi-video user. It doesn't seg fault when run manually as the right user.

fryfrog commented 6 years ago

So running the damn thing manually looks like it is working in my test instance. I'd need to update my real instance and try running it after it seg faults to prove it working for sure though.

Also, it seg faults when I run it as root... but not as unifi-video user. For the life of me, I can't find where/what starts it up to confirm how it gets run. Could it be starting as root? :/

evil-dog commented 6 years ago

So looking at the process tree on my working unifi-video, ems is started by the main Java process for unifi-video, the one that is running as the unifi-video user.

Is there a way to monitor the process tree so we can see if the evostream that is being spawned in 3.9.2 is running as root or as unifi-video? I think it's happening to fast to see by hand.

fryfrog commented 6 years ago

That is exactly what I was trying to think of a good way to do. Maybe a looping ps logging to a file? :/

fryfrog commented 6 years ago
root@unifi-video:/# ps aux | grep evo
unifi-v+ 171 9.1 0.2 598116 40560 ? Sl Feb25 113:44 bin/evostreamms /usr/lib/unifi-video/conf/evostream/config.lua

On a different docker image's thread (that also doesn't work for me), a user posted a working evostream ps and it is running as unifi-video user too, so it probably isn't starting as root. :/

https://community.ubnt.com/t5/UniFi-Video/Running-Unifi-Video-3-1-5-in-Docker-on-Synology/m-p/2258999#M98088

fryfrog commented 6 years ago

I made evostreamms a bash file that send $@ and id to files, it confirms it only runs w/ the args we know and the user we expect.

0 ✓ fryfrog@apollo /data/unifi-video-test/data $ cat id.txt
uid=985(unifi-video) gid=985(unifi-video) groups=985(unifi-video)
0 ✓ fryfrog@apollo /data/unifi-video-test/data $ cat args.txt
/usr/lib/unifi-video/conf/evostream/config.lua
fryfrog commented 6 years ago

I just manually installed the fucking tzdata package in my 3.9.2 container and it seems to start up right. I'm going to build a container that adds that package and see what happens.

evil-dog commented 6 years ago

I've seen other dockers that require mounting the host's localtime onto the container's localtime.

Maybe that's a better solution then installing another package into the container?

Or, is it that unifi-video is looking at the other tzdata files?