Server error 500 when fetching recording with gap

Describe the bug A specific API call to retrieve a recording causes server failure

To Reproduce API call to get recordings returns this:

{
  "recordings": [
    {
      "startTime90k": 147040591358403,
      "endTime90k": 147045161582867,
      "sampleFileBytes": 246671303,
      "videoSamples": 1269559,
      "videoSampleEntryId": 2,
      "startId": 15825,
      "openId": 61,
      "endId": 16671
    },
    {
      "startTime90k": 147045161895889,
      "endTime90k": 147047371051372,
      "sampleFileBytes": 127244410,
      "videoSamples": 613649,
      "videoSampleEntryId": 2,
      "startId": 16672,
      "openId": 61,
      "firstUncommitted": 17080,
      "endId": 17080,
      "growing": true
    }
  ],
  "videoSampleEntries": {
    "2": {
      "width": 640,
      "height": 480,
      "paspHSpacing": 126,
      "paspVSpacing": 95,
      "aspectWidth": 168,
      "aspectHeight": 95
    }
  }
}

So in that time interval there is a 3 second gap between two recording entries. Issuing a call to view.mp4 that spans that gap returns a server 500 error (suggesting an uncaught exception):

http://localhost:8080/api/cameras/252dd5ce-3503-400f-a5e1-4d2de27907b2/sub/view.mp4?s=15825-17058%4061.4590000-6661260000

Return is:

< HTTP/1.1 500 Internal Server Error
Invalid argument: unable to append recording 2/16672 after recording 2/16671 with trailing zero

If this is genuinely not allowed the error code should be in the 4xx range, but that's a minor issue. I see there is a TODO mentioned in the api.md that probably refers to this:

TODO: error behavior on missing segment. It should be a 404, likely with an application/json body describing what portion if any (still) exists.

The real problem though is how to deal with this in the UI. The intention is to allow any given time range to be selectable for playback, across multiple cameras, kept in sync. Since any recording gaps won't necessarily be the same across cameras or even streams (in the example given only the substream has a gap - the main stream from the same camera is uninterrupted) there needs to be a way for the playback to show no data during the gap while showing the valid data from other streams, resuming playback after the gap.

So far I can think of two options:

The server fills in gaps with blank data. Not sure how difficult this is, especially since the server does no decode/encode.
The UI requests segments from only one recording at a time for each stream, and sets reminders to itself to start the next recording stream at the appropriate time.

Any other ideas?

I think that TODO is describing a different case: the recording ids you're referring to no longer exist in the database, presumably because they've been rotated away. I'm not sure what HTTP status code it's giving now, but it should be a 404, and the UI might want structured information rather than having to redo the /recordings list.

(By the way: there's a related problem in which the recording gets deleted halfway through downloading the .mp4. The server doesn't open a given recording until it gets to its starting byte position within the .mp4, and it currently doesn't prevent recordings from being deleted between starting the request and getting to that position. If it gets to the position and discovers the file no longer exists, there aren't any good options left. There's no way to amend the HTTP response header. There's a concept of HTTP request footers but I don't think browsers really support them. So it just drops the connection. Arguably it should instead keep track of what recordings are needed for requests in flight and prevent them from getting deleted, but that's not implemented.)

In the trailing zero case:

Invalid argument: unable to append recording 2/16672 after recording 2/16671 with trailing zero

I agree that it should at least be status code 400 rather than 5xx.

Here's why this error exists: there's a mismatch between how RTP works and how .mp4 works. The former just says "here's a video frame at this timestamp". The latter says "here's a video frame with this duration". When the server translates from RTP to .mp4, it uses the next frame's timestamp to calculate the current frame's duration. When recording ends (due to error, shutdown, whatever), there's no next frame. I chose to set the duration to 0 in that case rather than discard the frame or assume a certain frame rate. I pass along that 0 in the .mp4, but the file format (according to ISO/IEC 14496-12 section 8.6.1.1) only allows a zero duration for the last video frame of the file. So my code produces this error rather than a non-compliant .mp4.

With regard to your options:

The server fills in gaps with blank data. Not sure how difficult this is, especially since the server does no decode/encode.

It's at least possible to alter the recording 2/16671's final frame duration to taste. The timestamps/durations are container-level concepts doesn't require messing with the encoded video.

An actual black frame gap...I think is possible with an edit list in a traditional .mp4 file, but maybe not in a .m4s (media segments don't support edit lists iirc).

The UI requests segments from only one recording at a time for each stream, and sets reminders to itself to start the next recording stream at the appropriate time.

I think this is the best long-term option for a scrub bar UI based on the HTML5 media source extensions. As mentioned in the .m4s documentation (see the paragraph starting with "It's recommended that each .m4s retrieval be for at most one Moonfire NVR recording segment"), it's probably best for latency/file format reasons not to request too much data at once.

In fact, we might even want to request less than a single recording at a time. Certainly when seeking to a particular time that's halfway through a recording, but maybe even in general. I'm open to tweaking the API to allow things like

"give me the nth GOP within recording id X"
when you request "give me time T after the start of recording X", for the reply to say not only how far its starting point is into the GOP (X-Leading-Media-Duration does that already) but also "this starts with GOP n/m within the recording.", etc.

The server can basically instantly mux any valid .mp4/.m4s and/or reply to any metadata query; it basically comes down to what the file formats support, how many bytes then have to be sent before the UI gets what it's looking for, etc.

Any other ideas?

We could just drop that last frame in the interest of prioritizing simplicity over preserving all of the data that came from the camera. The UI could request "leave out the trailing zero frame" at request time, or maybe we just never store it in the first place.

Hmm, a lot of that is over my head. What's a GOP?

I chose to set the duration to 0 in that case rather than discard the frame or assume a certain frame rate.

Why not assume the same frame rate as observed immediately prior?

it's probably best for latency/file format reasons not to request too much data at once.

That's a separate issue - needs discussing but I don't want to get sidetracked. In this case the requested recording segment could be as short as a few seconds and the problem would still exist if it happened to span the recording gap.

We could just drop that last frame

So what would happen then - what would be in the video stream for the gap between the recordings? Or would the server still throw an error because it can't deliver a contiguous stream?

see the paragraph starting with "It's recommended that each .m4s retrieval be for at most one Moonfire NVR recording segment")

But I'm not using .m4s retrieval, I'm using the .mp4 endpoint. Which leads to another question - what's the difference between the .mp4 and .m4s endpoints? I did Google it but I'm still confused. And should "recommended" be "required"?

And, by "recording segment" do you mean one element of the recordings array returned from the /recordings API endpoint?

Unless split by the split90k parameter does the presence of multiple elements in that array for a given request indicate that a contiguous video stream will not be available across those element boundaries? Or is the behaviour I have seen an anomaly?

What's a GOP?

It's a video encoding term: Group Of Pictures. I'll make a note to add it to the glossary. Basically, every so often, the video stream will include an "IDR frame" or "key frame" which can be decoded entirely on its own. Then subsequent frames will require previous ones to be decoded properly. A group of pictures is an IDR frame and all of the other frames that depend on it (and possibly each other). Many security cameras have an IDR frame at an exact interval of every 1 or 2 seconds. Some advertise a feature called "H.264+" or "smart encoding" or something that seems to make the GOPs variable-length and a bit longer (up to 5 seconds or something). In practical terms, if you want to seek to some arbitrary time, you end up getting a bit of leading video along with it.

Why not assume the same frame rate as observed immediately prior?

Yeah, we could do that, except in the corner case where we received exactly one frame in the recording. (I've seen that happen.) /shruggie I don't really know what's best. One disadvantage would be it'd be a bit more likely for recordings to overlap in wall time if the frame rate varies for some reason.

So what would happen then - what would be in the video stream for the gap between the recordings? Or would the server still throw an error because it can't deliver a contiguous stream?

As implemented today, if you request two recording ids that aren't adjacent (and the first doesn't have the trailing zero problem), it will just send one and then the other, with no gap between then, so a sudden jump in time.

But I'm not using .m4s retrieval, I'm using the .mp4 endpoint.

You are using the .m4s stuff for live view, though; it's similar to that.

Which leads to another question - what's the difference between the .mp4 and .m4s endpoints?

In practical terms it comes down to this:

.mp4 produces a file suitable for throwing at a video player and having it play that video. It's simple and self-contained. When you just do <video src=".../view.mp4?...">, the browser sends a HTTP request to get the file starting from the first byte. The file starts with the index (durations and byte positions of each frame within the video). The browser gets all of that (which will take a while if the video is too long) and then starts playback and handles seeks for you. In my experience, it's...okay...for hour-ish video files. There's some initial delay to get all the index bytes. Then it often is pretty fast, but sometimes I've noticed seeks being painfully slow (many seconds) from my phone over LTE, I think because the browser is determined to let a pending HTTP request reach the byte position of interest rather than dropping it and making a fresh HTTP range request starting from the byte position of interest.
.m4s files can and must be sliced and diced via HTTP Media Source Extensions or HTTP Live Streaming. (At a minimum, you require the init segment file + one media segment file.) As you've discovered with /live.m4s, this can be a pain to manage. But the pay-off is that you can splice a bunch of media segments together into one media timeline. So you can do fancier stuff like the live view, adding new video that didn't exist when playback started. You can avoid sending index bytes for parts of the video you probably will never look at. You can fully manage your own HTTP requests and buffering.

And, by "recording segment" do you mean one element of the recordings array returned from the /recordings API endpoint?

Sorry, that should have said one recording (a single id). I'll update the text. I'm still refining the terminology, and sometimes miss old places, which I know adds to the confusion...

And should "recommended" be "required"?

Somewhere between, I guess. I don't recommend using /view.m4s with more than one recording id. It probably will sometimes work. I listed some reasons it might not like exceeding 2^32 media bytes. I forgot to mention the trailing zero case as another reason.

Unless split by the split90k parameter does the presence of multiple elements in that array for a given request indicate that a contiguous video stream will not be available across those element boundaries? Or is the behaviour I have seen an anomaly?

Today, yes, I think split90k and end of run (thus a trailing zero) are the only reasons there'd be a new array element.

If I ever get around to it, I'll implement switching videoSampleEntryId mid-run, and then there'd be another reason.

It'd be easy for me to add "hasTrailingZero": true to the array elements in question so you don't have to guess.

I just opened a PR to add some of these missing bits to the docs. Please comment anywhere you're confused so I know where to improve.

I'm closing this after making three changes:

better docs
explicit hasTrailingZero in /recordings response so the UI doesn't have to guess
returning 400 Bad Request instead of 500 Internal Server Error

although I'm open to the idea of making other changes like just not saving the last frame at all, avoiding this extra complication.

scottlamb / moonfire-nvr

Server error 500 when fetching recording with gap #178