rutgersc / m3u8-rs

m3u8 parser for rust
MIT License
99 stars 23 forks source link

Accumulate byte ranges when creating the `MediaPlaylist` #38

Open sdroege opened 2 years ago

sdroege commented 2 years ago

See 4.4.4.2:

   If o is not present, the sub-range begins at the next byte following
   the sub-range of the previous Media Segment.

and also

   If o is not present, a previous Media Segment MUST appear in the
   Playlist file and MUST be a sub-range of the same media resource, or
   the Media Segment is undefined and the client MUST fail to parse the
   Playlist.

Questions for this:

rutgersc commented 2 years ago

Seems reasonable to validate this, especially if it fails to parse a playlist which would otherwise not be usable anyway? I'm unfortunately not a user of of these more in-depth attributes, so I have a hard time judging that myself.

* Do you want to do this in `media_playlist_from_tags` (which would have to be fallible then)?

Yes, so what would it check on then exactly? Just that if o is not present, check if previous segment has the same name?

* Do you want to make the offset in the `ByteRange` non-optional as it would always be set now?

I would expect the parser to not add any extra data that's not already in the original file, even if it's something that can be calculated. But that's just my initial reaction, maybe it's super handy. Also, can we be sure that the calculated offset is always correct? For example this seems to not even start at 0.

* Or do you want to leave the whole thing to the user of the library?

Failing on playlists that can't be used seems useful to me.

sdroege commented 2 years ago

Byte ranges are given by a length and an optional offset. If no offset is given then all segments before must have a byte range and at some point before there must be a byte range with an offset, which is then used as "anchor". The offsets for the segments that don't have an explicit one is calculated by taking this first offset and summing up the lengths up to the current segment.

So for validation we should check if for every segment that has only a length that there is such an anchor and all segments between those two have a byte range too. Otherwise the playlist is invalid as you can't really know what should actually be downloaded.

Also, can we be sure that the calculated offset is always correct? For example this seems to not even start at 0.

If the playlist is correct and following the above rules, yes. Your example playlist always has offset and length, so no actual work has to be performed.

I would expect the parser to not add any extra data that's not already in the original file

That's exactly the question, yes. If we would calculate the offsets and provide them to the user then it's not guaranteed anymore that parsing a playlist and then serializing it again gives identical output (but that's already now not given because of optional whitespace). It will however always give equivalent output.

If we don't calculate the offsets while parsing, then every user of the library would have to do that and to avoid calculating it over and over again would have to cache it somehow. That would require adding a secondary data structure for the playlist, or transforming the playlist into a full custom data structure.

My suggestion would've been to always calculate the offset so that this is not necessary, and when serializing the playlist we could decide whether to always write out the offset or to omit it when it can be left implicit but that requires a bit more logic when writing.

sdroege commented 2 years ago

FWIW for my usage I'm now transforming the playlist into my own data structure anyway as I have to calculate both the offsets and the timestamp (or rather: summed up duration from the start of the playlist) for each segment, and will likely also want to add some more caching for live playlists with program-date-time.

rutgersc commented 2 years ago

Ah right ofcourse, there has to be at least one anchor to use as a starting point and fail otherwise. Having the offset be non-optional is nice too. Agreed that this would be useful to have it calculated by the parser, but I can imagine that there's many other such features that this library also doesn't do, which requires you to have a secondary data structure anyway.

Do you think it's worth the extra logic when serializing the offset? As you mention, the output already isn't the same anymore. To me it sounds fine to make a choice here and just have it be serialized in one way, preferably with the offsets then.

Also, I seriously doubt I'm the right person to ask for these decisions, since as I mentioned I'm not really using it anymore, and am having a hard time keeping up to date with even the simplest additions. If you feel like it and if it's possible, I can give more permissions or something to reduce the friction of making changes available to everyone (completing PRs? uploading new versions?).

sdroege commented 2 years ago

I'm not entirely sure either. Either seems fine to me, it's a design decision and as we both said the user will likely need their own data structures to keep track of the pieces they actually need anyway. But we should at least validate such things during parsing, that's something I can add :)

If you want to give me permissions to review/merge PRs and maybe upload new versions then I can help maintaining this crate.

sdroege commented 2 years ago

So let's only validate this when parsing and not post-process the offsets when parsing. Does that sound good to you?

rutgersc commented 2 years ago

Looking at this again, validating seems like a nobrainer indeed. And since we already know what the offsets should be, why not keep the validated data around and serialize it, keeping it simple. The point that parsing/serializing doesn't produce the same output doesn't stack up to the benefit the user gets if we just add it. So, yes to post-process? 😄

I'll take a look at the permissions.

sdroege commented 2 years ago

It's also not very different from the other post-processing that already exists in media_playlist_from_tags.