scottlamb / moonfire-nvr

Moonfire NVR, a security camera network video recorder
Other
1.3k stars 143 forks source link

[Feature] mp4 live stream and standard authentication #203

Open robin-thoni opened 2 years ago

robin-thoni commented 2 years ago

Is it possible to add (AFAIK, there's no feature like that) an endpoint to copy the live stream, as an mp4 stream, so it can be embedded into another app (I'm thinking about Home Assistant)? Currently, the live stream is a websocket, which doesn't seem usable outside of Moonfire.

The second point is about authentication: pretty much the same as above, this is specific to Moonfire and difficult to integrate elsewhere. HTTP Digest, or even preferably OIDC (OAuth), would be really helpful. I know you said you don't want OAuth because you want it to work offline, but OAuth has no "online" requirement: one could simply spin up a Keycloak instance next to moonfire, and it would be 100% local.

What do you think?

Thanks

scottlamb commented 2 years ago

I'm open to it, but the devil's in the details. There's no "just an mp4 stream" for live viewing AFAIK. There's the browser APIs that client Javascript can execute (notably MSE, although I sadly learned iPhone Safari doesn't support even that), there are wire protocols (RTSP, WebRTC, RTMP, HLS, MPEG-DASH), and there's less overlap between them than one might think. What Moonfire does today is use MSE from its Javascript, over a simple but custom WebSocket-based protocol. If there's a comparatively simple, at least as well-supported protocol that I've missed, I'd be happy to just switch to it. If there are other protocols folks would like it to support, I'm open to adding them. But my progress for Moonfire NVR comes little by little, and there are other features I'd desperately like to have (analytics, audio, web-based config, traditional scrub bar UI, etc.), so I can't promise it will happen quickly unless someone is able to roll up their sleeves and send me PRs.

Do you have anything more specific in mind? Is there a certain pre-existing Home Assistant integration that you were hoping to use? And would the http(s) requests to Moonfire be coming from the Home Assistant server or directly from the browser/app?

HTTP Digest ... would be really helpful.

HTTP Digest is possible but has the huge downside requiring the server to store the passwords in plaintext. I'd prefer HTTP Basic for that reason.

or even preferably OIDC (OAuth) ... know you said you don't want OAuth because you want it to work offline, but OAuth has no "online" requirement: one could simply spin up a Keycloak instance next to moonfire, and it would be 100% local.

Yeah, that's also possible. I understood (possibly incorrectly) folks asking about it before as wanting to use major Internet OAuth2 providers, so the proposal of a local one is new to me. But in either case, it's certainly possible, just a matter of priorities.

robin-thoni commented 2 years ago

Yeah, I should probably have said mjpeg live stream.

Any integration within https://www.home-assistant.io/integrations/#camera. I guess mjpeg or anything that works w/ ffmpeg would work for me. Those integrations will proxify the streams through HA, so the answer to your question is: HA will do the requests, effectively hiding Moonfire credentials from the end user.

Yeah you're right about Digest vs Basic. You either store or send the password in plaintext. Pest or cholera...

Regarding OAuth, the most important rule about security is, IMHO, don't roll your own authn/authz. Keycloak will handle user registration/login/2FA/etc and it has a built-in module for authorization, meaning you can just remove all that stuff from Moonfire and delegate it to Keycloak. This would make Moonfire really tight to KC, but it would solve a lot of work/issues for you.

scottlamb commented 2 years ago

Yeah, I should probably have said mjpeg live stream.

I'd prefer to avoid MJPEG. As compared to passing through the encoded H.264, transcoding frames to JPEG uses more server-side CPU, uses more bandwidth, has worse image quality, and is choppier. Also, the multipart/x-mixed-replace scheme doesn't have a way to add in audio. So I don't think adding MJPEG would be worth its weight.

Any integration within https://www.home-assistant.io/integrations/#camera.

It looks the Home Assistant's generic camera platform platform supports RTSP, and the RTSPtoWebRTC platform (apparently written by an old teammate of mine! small world!) can turn that into WebRTC (by proxying through a server written in Go).

If you want this to work right now, I suggest having HA talk to the cameras independently of Moonfire. All the cameras I've experimented with support several simultaneous RTSP clients. Mine commonly have several RTSP clients (including a couple/few Moonfire instances running for development/testing) and it's been fine.

That said, I see how there could be advantages to HA talking to Moonfire. Having binary switches and visual overlays for Moonfire's (future) analytics, reducing the bandwidth to the cameras (important if they are wireless and/or remote), allowing a tighter firewall setup that better contains the (notoriously insecure) cameras, avoiding duplicating camera config, avoiding the need to run a separate RTSPtoWeb server, showing events in HA's media library as Frigate does. In the shiny future Moonfire could proxy over RTSP and/or WebRTC and even have its own HA integration which (given only a Moonfire URL and credentials) makes all the cameras just work.

Regarding OAuth, the most important rule about security is, IMHO, don't roll your own authn/authz. ... you can just remove all that stuff from Moonfire and delegate it to Keycloak

I want this to be easy to setup and use, with as few moving parts as possible. E.g. I use SQLite rather than a database server, and I'd like to remove the need for a proxy server in front of Moonfire (#27). My instinct is that I don't want the user guide to say "first, go set up Keycloak". So I think if we do offer OAuth, it should be in addition to Moonfire's native auth.

Moonfire's auth isn't anything crazy. It's missing some things (notably 2fa) but I feel pretty good about what's there. I didn't roll my own crypto. It's using session cookies with plenty of entropy and the best practice SameSite and secure attributes and csrf token, passwords hashed with the decently modern scrypt algorithm, and session credentials stored hashed also so that a database leak (e.g. access to an old backup) doesn't turn into long-term access to the system. It's always possible I made a critical error somewhere along the way, but the same is true when integrating with another system. I also have some expertise in this area—I worked on the Google Identity & Authentication team for several years.

robin-thoni commented 1 year ago

I'm trying to integrate Moonfire into Home Assistant. My integration can connect to Moonfire, get the list of cameras and import them into Home Assistant. That part works great.

Now I'm having troubles with the video stream. I reversed the UI to understand the websocket protocol, and what I got is you have to:

I'm trying to read a file (or a fifo) created that way on VLC, but it will only play a few frames, then hang on the last frame (the file gets bigger over time, so I know the websocket is still receiving data). The video is still in "playing" state, it's not stopping or trying to play next. VLC logs the following repeatedly while hanging:

mp4 demux: Fragment sequence discontinuity detected 1 != 2

Do you have any idea what could be wrong? I can privately submit you a sample file, if you wanna investigate

Here are the relevant part of my PoC, in case it would trigger something to you:

    def stream(self, camera_uuid: str, stream_init=True):

        cookies = self.session.cookies.get_dict()
        url = self.url(f"cameras/{camera_uuid}/sub/live.m4s").replace('https:', 'wss:').replace('http:', 'ws:')

        ws = websocket.WebSocket()
        self.streams[camera_uuid] = {'ws': ws, 'running': True}
        ws.connect(url, cookie="; ".join(["%s=%s" %(i, j) for i, j in cookies.items()]))

        def parse_part(data):
            headers_b, data_b = data.split(b'\r\n\r\n')
            headers = {h.split(': ')[0]: h.split(': ')[1] for h in headers_b.decode('utf-8').split('\r\n')}
            return headers, data_b

        if stream_init:
            first_part = ws.recv()
            headers, data_b = parse_part(first_part)
            init_mp4 = self.stream_init(headers['X-Video-Sample-Entry-Id'])

            yield init_mp4
            yield data_b

        while self.streams[camera_uuid]['running']:
            headers, data_b = parse_part(ws.recv())
            yield data_b

        ws.close()
    while True:
        try:
            with open(fifo_path, 'wb') as fifo:
                stream = moonfire.stream(camera['uuid'])
                for part in stream:
                    print('part')
                    fifo.write(part)
        except BrokenPipeError:
            print('broken')

Then I just run

vlc /tmp/fifo.mp4

Chrome acts pretty much the same as VLC: it plays only a few frames, then hangs, even after recording for ~15 seconds.

If I manage to get it working on HA, then this issue won't be relevant to me anymore.

Thanks!

scottlamb commented 1 year ago

mp4 demux: Fragment sequence discontinuity detected 1 != 2

I just grepped for this in VLC code:

https://github.com/videolan/vlc/blob/3c9e8c2005f0f5d3810dde925a03d5d3038e8d15/modules/demux/mp4/mp4.c#L5217-L5221

It apparently reads the sequence number from the mfhd box.

https://github.com/videolan/vlc/blob/3c9e8c2005f0f5d3810dde925a03d5d3038e8d15/modules/demux/mp4/mp4.c#L1784-L1790

Moonfire always writes that as 1 right now:

https://github.com/scottlamb/moonfire-nvr/blob/a6bdf0bd808d6cde09fd8949ce00a51804fb1b65/server/src/mp4.rs#L1247-L1251

I can hack together something to make it increment on the WebSocket live stream and see if it helps.

scottlamb commented 1 year ago

Does 28cd864 (on the increment-seq branch, not master) fix the problem?

scottlamb commented 1 year ago

Oh, we'll probably need to do the same thing for baseMediaDecodeTime also:

https://github.com/scottlamb/moonfire-nvr/blob/a6bdf0bd808d6cde09fd8949ce00a51804fb1b65/server/src/mp4.rs#L1275

robin-thoni commented 1 year ago

I tried to build w/ Docker according to your build insrtuctions:

docker buildx build --load --tag=moonfire-nvr -f docker/Dockerfile .

But it's failing after 6-7 minutes:

...
docker buildx build --load --tag=moonfire-nvr -f docker/Dockerfile .
...
#19 1.086 test result: ok. 37 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.12s
#19 1.086 
#19 1.090 
#19 1.090 real  0m0.377s
#19 1.090 user  0m0.400s
#19 1.090 sys   0m0.182s
#19 1.090 + cargo build --profile=release-lto
#19 1.315     Finished release-lto [optimized + debuginfo] target(s) in 0.17s
#19 1.321 
#19 1.321 real  0m0.231s
#19 1.321 user  0m0.164s
#19 1.321 sys   0m0.064s
#19 1.321 + sudo install -m 755 /var/lib/moonfire-nvr/moonfire-nvr /usr/local/bin/moonfire-nvr
#19 1.325 install: cannot stat '/var/lib/moonfire-nvr/moonfire-nvr': No such file or directory

I don't have much time to debug that right now, I'll try again tomorrow, but if you have solution in the meantime, that'be great

scottlamb commented 1 year ago

Oops! My workstation is down so I haven't tested this properly, but I think b3d39df edit: 936c7d9 (in particular: 7fe2284) fixes that problem. again on increment-seq branch.

robin-thoni commented 1 year ago

It works, thanks for the quick fix! I'll try the sequence fix and report back

robin-thoni commented 1 year ago

I can confirm your commit fixes the issue! That will allow me to continue my HASS integration. Thanks for fixing it that quickly!

robin-thoni commented 1 year ago

VLC is happy, but now ffmpeg is complaining:

mov,mp4,m4a,3gp,3g2,mj2 @ 0x55fd6e5d1300] Found duplicated MOOV Atom. Skipped it
    Last message repeated 15 times

Not sure if this is related?

scottlamb commented 1 year ago

I can confirm your commit fixes the issue!

Cool, glad that helped you keep moving. In general I'm not sure what seq number to set when the caller can request chunks of basically arbitrary size, but at least for the /live.m4s endpoint where the server decides the boundaries and keeps state, this should be fairly harmless. I'll give it a little test with the UI live end stream point then probably merge it to master as-is.

I'm a bit glad it wasn't necessary to set a non-zero baseMediaDecodeTime as that's more likely to cause problems for the current UI code structure.

Found duplicated MOOV Atom.

Hmm, only the initialization segment should have a MOOV atom in it. So if you're just appending one of those at the beginning of the file (which should be fine as long as you don't change camera parameters mid-stream), I don't know why you'd be seeing this message.

You could open the file with e.g. https://gpac.github.io/mp4box.js/test/filereader.html to inspect the structure.

robin-thoni commented 1 year ago

This one is on me, when refactoring the code, I resent the init segment with each chunk...

Now the only ffmeg complain is

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x563d35170300] DTS 0 < 134908 out of order

once at the begining.

VLC won't play the generated m3u8 with

ffmpeg -i /tmp/fifo.mp4 -f hls -hls_time 4 -hls_playlist_type event stream.m3u8
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:1
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-PLAYLIST-TYPE:EVENT
#EXTINF:1.000000,
stream0.ts
#EXT-X-ENDLIST
$ vlc stream.m3u8 
VLC media player 3.0.9.2 Vetinari (revision 3.0.9.2-0-gd4c1aefe4d)
[0000557a887225b0] main libvlc: Running vlc with the default interface. Use 'cvlc' to use vlc without interface.
[00007fc5900015a0] cache_read stream error: cannot pre fill buffer
[00007fc598001170] adaptive demux error: Failed to create demuxer (nil) Unknown

Then it just stays on the playlist view and won't display any frame at all.

For some reason, Home Assistant hangs when trying downloading the m3u8 it is supposed to generate to play the live stream. HASS internally uses ffmeg (and generates the warning as above), so maybe both are related. I simply pass back to Home Assistant the path to my fifo (HASS expects any string input ffmeg can handle, like URLs and file names). This might be related to HASS internals. Not sure yet, still investigating.

scottlamb commented 1 year ago

I've been thinking this over. I suspect the problem is the timestamps. The structure of the media segments is set up for using with my HTML5 Media Source Extensions code, which doesn't care about their sequence numbers and is happy with their timestamps starting at 0 each time. When ffmpeg is reading a single file and sees fragments with baseMediaDecodeTime going back to 0, I can understand how it wouldn't be happy. I can try a small change on that branch to make them continue where they left off.

To be honest though I'm not super optimistic about the never-ending .mp4 file over a FIFO approach working reliably in all cases. It's a neat idea! But I've never seen it done before, and none of the specs describe it, so if there's a problem I don't know if it's possible to solve it in general. E.g. if we make up our own idea of how it should work then send ffmpeg folks a patch to make it work better they may just say don't do that or ask to see the (nonexistent, AFAIK) spec first. I think that'd be totally valid from their perspective to not want to complicate the code that has to handle more common/standardized scenarios.

So I can give this another try or two but if it doesn't work out, we may just have to bite the bullet and adopt a better-supported way. I mentioned some protocols up-thread (RTSP, WebRTC, RTMP, HLS, MPEG-DASH). There are also a couple other formats that I would expect to work reliably over a simple stream (TCP/FIFO/whatever): RTP or MPEG-TS streams. I'm a little confused about some finicky details of how to make HLS/MPEG-DASH work [1], but in general, all of these protocols are possibilities, it's "just" a matter of sufficient elbow grease.

[1] E.g. LL-HLS mandates some specific requirements about partial segments being of uniform duration that seem to be assuming we control the frame rate, position of IDR frames, etc., when in our use case the camera controls that and we just go along with it. I don't know what we're supposed to do if those parameters change mid-stream. But maybe players aren't super strict about stuff like this, it probably comes up rarely, etc., so we can probably get away with bending the rules a little.

robin-thoni commented 1 year ago

I do understand your point, non standard stuff, etc, I'm aware I'm doing hacky stuff, and my video/streaming skills are close to zero, as you probably realized :) I ran this simple test: HASS =(RTSP)=> VLC =(FIFO)=> My Python PoC =(Websocket)=> Moonfire =(RTSP)=> Camera And it worked. Mostly. It was laggy (too much time buffering), the browser was showing a progress bar and time (so not a live stream), and a few seconds behind wall clock. So, apparently, VLC re-streaming the fifo over RTSP made it work. Not sure it has any practical point, though.

From here, I can see 2 solutions:

I'd really love the first solution, but I understand it's more work for you. I definitely do not have the required skills to help on this, and not enough time to learn it for this single use-case. So it's up to you :)