home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
72.72k stars 30.45k forks source link

CAMERA.RECORD from STREAM - Invalid Video Length property #34605

Closed GaryOkie closed 4 years ago

GaryOkie commented 4 years ago

The problem

MP4 recordings from the camera.record service are created with actual lengths that are fairly close to the requested duration+lookback (within a few seconds). However, the video length property is way off. The length property that is saved in the file is improperly listed as just 1-4 seconds long, regardless of the actual length.

The biggest issue here is that (some*) web browsers ignore the full length of the actual video and play only the much shorter listed length, nothing more. Note that other players (VLC, MS MediaPlayer) also show the incorrect short listed length on the playback timeline, but continue on and play the entire video.

*Note: I normally use FireFox where this issue was discovered. Just tried Chrome, and it will playback the entire video. Other players may also have a truncated timeline, but play on. Still - the video length property should match the actual recorded length.

Environment

Problem-relevant configuration.yaml

configuration.yaml

stream:
camera:
  - platform: ffmpeg 
    input: !secret rtsp
    name: "Amcrest Doorbell"

automation.yaml

    - service: script.video_record
      data_template:
        entity_id: camera.amcrest_doorbell
        filename: '{{ states("input_text.filename") }}'
        duration: 5
        lookback: 3

scripts.yaml

video_record:   
  sequence:
    - condition: state    # continue only if no previous recording already in progress  
      entity_id: input_boolean.video_recording
      state: 'off' 

    - service: input_boolean.turn_on   # turn on recording in progress flag
      entity_id: input_boolean.video_recording

    - service: camera.record
      data_template:
        entity_id: '{{ entity_id }}'
        filename: '{{ filename }}.mp4'
        duration: '{{ duration }}' 
        lookback: '{{ lookback }}' 

Traceback/Error logs

NO errors

Additional information

I ran a series of recording tests with durations from 1 to 10 seconds long, all with a lookback of 3 seconds. As you can see, the expected length was reasonably close to the actual recording length, but the file length recorded in the mp4 file is nothing close to reasonable for the longer recordings.

lengths

Shown here is the file properties of the 10+3 second duration MP4 recording. Expected length 13 seconds, actual length 10 seconds, calculated Video Length property - 2 seconds.

image

dshokouhi commented 4 years ago

So the docs do mention that you won't get exact values, have you tried adjusting the settings to be higher to see if the clip is longer?

Both duration and lookback options are suggestions, but should be consistent per camera. The actual length of the recording may vary. It is suggested that you tweak these settings to fit your needs.
probot-home-assistant[bot] commented 4 years ago

Hey there @hunterjm, mind taking a look at this issue as its been labeled with a integration (stream) you are listed as a codeowner for? Thanks!

GaryOkie commented 4 years ago

Thanks for the quick reply and suggestions. Yes, I am aware that I cannot expect exact durations, and have mentioned that the expected duration vs actual are "reasonable" and generally within a few seconds. I have no problem with that.

There is a video length property stored in the actual MP4 file that is part of the header for playback. This length is just flat wrong, and there is nothing that I have found that can make it anywhere close to reality except for the shortest recordings.

Some kind of calculation in the code seems to be stuck between 1-4 seconds for recordings regardless of actual length. I have not tried longer than 10+3 seconds since my intention is to transmit activity MP4 via Telegram. I've even looked at the code, but it's a major undertaking to unwind coming in cold.

hunterjm commented 4 years ago

This is a known quirk and is related to how I am manually concatenating the mpeg-ts segments (which have no header info due to being used in live streaming situations) into a single file and remuxing it to an MP4 container.

The trade off here was valid mp4 headers vs resource consumption preparing the recording and allowing for “lookback” to account for delays in automations being triggered due to prerequisites like image processing components for object detection.

Since most media players are able to play the file that is created with no issue anyway, I decided the trade off was worth it, even if it’s not technically “correct”.

hunterjm commented 4 years ago

For reference, here is the relevant code: https://github.com/home-assistant/core/blob/dev/homeassistant/components/stream/recorder.py#L18-L39

GaryOkie commented 4 years ago

Thanks for the explanation Jason. I understand the tradeoff given that "most" players overlook it except for the playback timeline.

I'll look at the code to see if I can understand where the "1-4" seconds video length comes from. I was wondering if the video length in the header could simply be the duration+lookback time and call it a day? Yeah, it wouldn't be completely accurate of course, but it should be a lot closer than what it is now and not require any resource consumption.

hunterjm commented 4 years ago

The actual encoding is all low level libav (the library that powers ffmpeg) work. I’ve no idea how to manually manipulate the output headers.

GaryOkie commented 4 years ago

oh, so dead-end and nothing we can do then to improve the video len?

EDIT: not ideal, but since my recordings are short, I'll experiment with ffmpeg remuxing the mp4 file immediately after it has been recorded to see if I can get a reasonably accurate, if not perfect, video length that way. Since camera.record runs asynchronously, coming up with automations that do this sort of thing in sequence is tricky, but I'll manage.

hunterjm commented 4 years ago

There is always something that can be done, but at this point it’s an effort/reward trade off. I found something suggesting adding a flushing packet to the end might help. We can also validate the frame pts/dts are monotonically incrementing as we remux the segments which could cause weirdness if there is issue there.

I don’t think it’s worth refactoring the entire thing to solve though.

GaryOkie commented 4 years ago

I understand and agree. I should have added "within reason" to my comment. That flushing packet idea does sound worth trying though.

During those 10 back to back (non-simultaneous) sequential recordings for my video length test, one of them did spit out this error:
ERROR (recorder_save_worker) [libav.mp4] Application provided invalid, non monotonically increasing dts to muxer in stream 0: 44405280 >= 43708500 But I can't say I've seen that show up during normal day to day recordings.

And FYI - I'm making good progress with a workaround.
ffmpeg -i input.mp4 -c copy output.mp4 As I expected, this fixes the video length in the header and is really fast. Now to just work out the async sequencing in the automation...

EDIT: .... and it's working. FireFox and me are happy now.

@hunterjm - So shall I close this, or leave it open for a little while in case you plan on giving that packet flush a whirl down the loo?

hunterjm commented 4 years ago

@GaryOkie - I ran a few tests on my dev instance, and VLC is showing the correct duration... so I pulled up recordings from my production instance and ran ffprobe on them. They too are showing the correct duration:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'driveway_20200423_193422.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1iso6mp41
    encoder         : Lavf58.29.100
  Duration: 00:00:29.98, start: 0.000000, bitrate: 4840 kb/s
    Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuvj420p(pc, bt709), 1920x1080 [SAR 1:1 DAR 16:9], 4839 kb/s, 12.01 fps, 30 tbr, 12288 tbn, 24 tbc (default)
    Metadata:
      handler_name    : VideoHandler

I have my lookback set to 10 and duration at 20, matching a multiplier of the I-Frame interval of my camera almost exactly so that I get consistent recording lengths.

I then ran it through MP4Box and that gave me a little more info about the file being produced:

* Movie Info *
    Timescale 1000 - Duration 00:00:02.500
    1 track(s)
    Fragmented File: yes - duration 00:00:00.000
11 fragments - 0 SegmentIndexes
    File suitable for progressive download (moov before mdat)
    File Brand isom - version 512
    Created: UNKNOWN DATE   Modified: UNKNOWN DATE
File has no MPEG4 IOD/OD

iTunes Info:
    Encoder Software: Lavf58.29.100
1 UDTA types: meta (1) 

Track # 1 Info - TrackID 1 - TimeScale 12288 - Media Duration 00:00:02.500
Media Info: Language "und (und)" - Type "vide:avc1" - 30 samples
Fragmented track: 330 samples - Media Duration 00:00:27.483
Visual Track layout: x=0 y=0 width=1920 height=1080
MPEG-4 Config: Visual Stream - ObjectTypeIndication 0x21
AVC/H264 Video - Visual Size 1920 x 1080
    AVC Info: 1 SPS - 1 PPS - Profile Main @ Level 4.1
    NAL Unit length bits: 32
    Pixel Aspect Ratio 1:1 - Indicated track size 1920 x 1080
    SPS#1 hash: 93F040E857E027E6EFB591393FB4380B0309B107
    PPS#1 hash: E2C4336287B33A2D561825FC8AC3C9878F254511
Self-synchronized
    RFC6381 Codec Parameters: avc1.4d0029
    Average GOP length: 30 samples

What that says is it's a fragmented track with 11 segments (should be 12). Which makes sense, because we are combining multiple mpeg-ts fragments into a single file.

The packet flush did nothing, and the timestamps were fine, so I'm out of ideas unless there is a way to ask libav to create a non-fragmented file from multiple fragments, but a quick google search turned up nothing useful.

hunterjm commented 4 years ago

My recommendation would be to close this as a quirk in how Firefox's video decoder works. A quick google of "Firefox MP4 won't play" comes up with tons of results from other sources claiming compatibility in every browser except FF. When I open in Chrome, I'm getting the full correct timeline, so was unable to reproduce that aspect.

McGiverGim commented 4 years ago

If it helps, I have the same problem. I store the videos and I try to reproduce them with Windows player, not any browser. The length in the player appears as 10-12 seconds but the videos are 30 seconds length. Is not a fatal problem, the video plays complete but I can't move the slider to select the playback point if the point is bigger than 10 seconds ;)

GaryOkie commented 4 years ago

Wow Jason - just now saw that you were able to incorporate the latest pyAV lib to fix this!

I've been doing a lot more testing (thanks for the MP4Box tip!), but now can just let it go, and wait for the update, and then get rid of the extra ffmpeg remux workaround.

One question though, in your test above where you selected 20sec duration + 10sec loopback which closely matched your i-Frame... was your FPS and iFrame 30?

My Amcrest Doorbell camera has these specs for FHD rtsp...

"Video" : {
--
"BitRate" : 768,
"BitRateControl"   : "CBR",
"Compression" :   "H.264",
"FPS" : 15,
"GOP" : 60,
"Height" : 1080,
"Pack" :   "DHAV",
"Profile" :   "High",
"Quality" : 4,
"Width" : 1920
},

It doesn't list the i-Frame here, but good bet it is the same as FPS (15). So what would be your advice (formula) to minimize fragmentation and get closer to expected total durations?

EDIT: after further tests, I tried duration: 15, lookback: 5 and mp4box reports ZERO fragmentation, and actual playback time as shown by VLC is also 15 seconds. Nice. (I also tried 20:10 same as your test, and got 9 fragments). So it appears that lookback doesn't enter into the actual total duration or segmentation like I thought it would.

Something else quite interesting - the remux'd copy of the original mp4 is a LOT smaller. Apparently, the more fragmentation in the original, the bigger the difference in file size of the remux'd copy. Even the latest test which reported 0 fragmentation, the remux'd file was more than half the size (700K vs 1600K).

hunterjm commented 4 years ago

I have my cameras configured for the frame rate and I-Frame interval to match, which means I will have 1 second segments and it will always match my selected values since the record service is only able to be changed in 1 second intervals.

Configuration

That was a recent change, I used to have my I-Frame interval as 2x my framerate for 2 second segments.

GaryOkie commented 4 years ago

Closing this since the fix is in for .109 (or .110 now). Thanks very much Jason for investigating the libav update and implementing it!

I am puzzled how your 10 sec loopback and 20 sec duration "matches a multiplier of the iFrame interval", which in your camera is 12. I would like to achieve the same result with a matching FPS/i-Frame of 15 where the loopback+duration time closely equals the actual recording time as your example did. Not a big deal. I'll keep experimenting.

hunterjm commented 4 years ago

I am puzzled how your 10 sec loopback and 20 sec duration "matches a multiplier of the iFrame interval", which in your camera is 12. I would like to achieve the same result with a matching FPS/i-Frame of 15 where the loopback+duration time closely equals the actual recording time as your example did. Not a big deal. I'll keep experimenting.

The math is simple. {keyframe_interval}/{frame_rate} = {seconds_between_keyframes}.

The logic for stream is: 1) Take live feed and break up into segments on keyframes 2) Keep latest 3 segments in memory 3) Build HLS manifest off of segments 4) Expose HLS manifest to be played in frontend

The logic for stream.record is also simple: 1) On trigger, stop removing segments based on segment length + lookback parameter 2) Once total # of segments reaches a total length of {duration} + {lookback} trigger the file write 3) Loop over all segments, concatenate them into a single file, and save it to provided location

Going back to the initial calculation, it can be re-declared as: {keyframe_interval}/{frame_rate} = {segment_length}

What this all means: 1) Lookback can't be > segment_length * 3 2) {lookback} + {duration} / {segment_length} must be a whole number if you want your recordings to be the expected length.

GaryOkie commented 4 years ago

That's very enlightening Jason, thanks!

So segment_length is 1 sec when FPS and keyframe interval are the same per the formula. (This matches your example above where 12/12=1 sec segment_length).

Your lookback (10) is then greater than (segment_length*3) so I must be mixing up terms?

In looking at my doorbell camera's video config , there is a Group of Pictures (GOP) compression of 60 listed along with an FPS of 15. There is no i-Frame interval listed in the config. Not clear how that fits into an optimal duration calc, but it's all good when the video length header fix is in, fragments or no.

hunterjm commented 4 years ago

Hmm, you’re right, but since we don’t stop recording until the entire duration is complete after the sum I’m really only getting 3 lookback and 27 duration.

GaryOkie commented 4 years ago

@hunterjm - the math is simple, but the inner workings of stream record still isn't clear-cut to me.

I've confirmed the doorbell camera frame rate for FHD(1920x1080) is 15 and i-Frame interval (aka GOP) is 60. So segment_length is .25. But looking at the code, it appears to be overriding that, and setting a minimum of .5 segment_length.

So with a duration=10, and lookback=5, 15/.5 seg_len= 30 secs expected length? But lookback can't exceed .5*3=1.5 so lookback is rounded up to 2 segments I believe. That equals a 28 second duration?

I'm not seeing that. I'm seeing a duration of 14-15 seconds, so segment length looks to be 1, not .5. I inserted several _logger_info statements trying to get a better handle on this, but it didn't clear things up all that well for me.

In preparation for the upcoming video length header fix, all I'm trying to do is define an optimal duration and lookback so actual video length is close and lookback is honored.

For what it's worth...

CAMERA Video:  1920x1080, FPS=15, iFrame Interval(GOP)=60,  Bitrate=768 (CBR)
CAMERA.RECORD  Duration: 10   Lookback: 5

2020-04-25 14:13:58 INFO (MainThread) [homeassistant.components.stream] duration: 10
2020-04-25 14:13:58 INFO (MainThread) [homeassistant.components.stream] lookback: 5
2020-04-25 14:13:58 INFO (MainThread) [homeassistant.components.stream] hls.target_duration: 2
2020-04-25 14:13:58 INFO (MainThread) [homeassistant.components.stream] hls.num_segments: 3
2020-04-25 14:13:58 INFO (MainThread) [homeassistant.components.stream] num_segments: 2

2020-04-25 14:13:58 INFO (MainThread) [homeassistant.components.stream] Wait for latest segment, then add the lookback...

2020-04-25 14:14:01 INFO (MainThread) [homeassistant.components.stream.recorder] RECORDER.py prepend self._segments:
       Segment(sequence=31, segment=<_io.BytesIO object at 0x7f8018b77e30>, duration=Fraction(4039, 1550)),
       Segment(sequence=32, segment=<_io.BytesIO object at 0x7f80171b67d0>, duration=Fraction(42367, 16000))]

2020-04-25 14:14:01 INFO (MainThread) [homeassistant.components.stream] last segment, add lookback: hls_get_segment:
       Segment(sequence=30, segment=<_io.BytesIO object at 0x7f8018a02c50>, duration=Fraction(3989, 1500)),
       Segment(sequence=31, segment=<_io.BytesIO object at 0x7f8018b77e30>, duration=Fraction(4039, 1550)),
       Segment(sequence=32, segment=<_io.BytesIO object at 0x7f80171b67d0>, duration=Fraction(42367, 16000))], maxlen=3)

2020-04-25 14:14:01 INFO (MainThread) [homeassistant.components.stream] subtract num_segments: 2

< 14 seconds later, recording completed >

2020-04-25 14:14:15 INFO (MainThread) [homeassistant.components.stream.recorder] RECORDER.py cleanup self._segments:
        Segment(sequence=31, segment=<_io.BytesIO object at 0x7f8018b77e30>, duration=Fraction(4039, 1550)),
        Segment(sequence=32, segment=<_io.BytesIO object at 0x7f80171b67d0>, duration=Fraction(42367, 16000)),
        Segment(sequence=33, segment=<_io.BytesIO object at 0x7f80119e4830>, duration=Fraction(10883, 4125)),
        Segment(sequence=34, segment=<_io.BytesIO object at 0x7f8011cdcad0>, duration=Fraction(2739, 1000))]
hunterjm commented 4 years ago

Sorry, I got the logic wrong above, it's {keyframe_interval}/{frame_rate}. So for your segments, length would be 4 seconds. That means you can have a lookback up to 12 seconds, and both should be a multiple of 4.

Something like duration: 8 and lookback: 8 or duration: 12 and lookback: 4

GaryOkie commented 4 years ago

Well, that segment length makes a lot more sense now. Thanks for the formula correction and parameter advice!