Closed iameli closed 3 years ago
Having the O uploading the files might introduce a security/trust issue in the public network since the B will want to validate before something airs or gets stored.
To @f1l1b0x's point, does the B validate all segments on the public network before live streaming playback? I don't want recording to inadvertently mess with the verification workflow? If we are bypassing verification, let's be conscious and upfront about that decision.
Yes but only in Public network the B will do a validation step for all segments before it airs to check the transcoding job has been done and the content has not been tampered. We are in the final steps of finalizing that no reference validation.
Yeah — I mentioned the existing behavior of -failObjectStore
because that's how it works right now but I think pretty much all these features are designed for the B. Having first-class support for recording on Os would be kind of sketchy.
Responding to my own question regarding how we handle recording across broadcaster switches...
Is there a node-first answer to this question, or must it be handled on the infrastructural side?
We could append a timestamp to the path where we send our streams, e.g. we push to the directory /{manifestId}-{unix timestamp}.m3u8
and /{manifestId}-{unix timestamp}/720p/165.ts
. Then even if there are broadcaster shifts, it's a simple matter for someone to parse all the m3u8 files for a given manifest ID and come out with a contiguous stream of segments.
We could even have a livepeer -osPlayback {osUrl}
mode to enumerate and recombine the manifests, serving out a consistent HLS stream on the 8935 HTTP server. Boom, node-first VoD.
We should also support providing the key as an inline JSON blob
My guess is there might be URL-library issues if plonking the blob directly within the password field. Will probably need to be escaped at the very least; hopefully we won't also encounter length issues.
the O uploading the files might introduce a security/trust issue in the public network since the B will want to validate before something airs
does the B validate all segments on the public network before live streaming playback? I don't want recording to inadvertently mess with the verification workflow?
@wohlner @f1l1b0x Uploading results to OS doesn't necessarily mean it'll get inserted into a playlist right away. If verification is enabled and fails for a given segment, then the segment gets retried and until the policy retry cap is hit and / or verification passes. In fact, having segments that failed verification be in persistent storage may be useful for later analysis.
Recording a segment will only occur if transcoding succeeds - which includes the verification step, if enabled - so there isn't a conflict there.
The behavior of streams with -recordObjectStore enabled is as follows:
I would also add the following condition:
This avoids additional processing in the default case ; the -recordObjectStore
would only be needed if the recording OS (or path) differs from the default. This really just highlights that the recording feature + OS are really decoupled features; recording needs a spec of its own.
My guess is that we'll need to have a couple other flags around recording anyway (eg, to specify the recording format and / or output file name) and using those will enable recording.
Probably good to have this type of scheme for the fail upload path as well. My guess is the fail upload will mostly be useful if the primary store is non-persistent (eg, filesystem that clears segments after each job).
After all uploads for a segment complete, push new m3u8 manifests for each rendition + a primary m3u8 manifest that references all the renditions. These manifests should contain all the segments transcoded in the stream, not just the most recent ones.
Again, this probably really belongs in a separate spec for the recording feature, rather than OS, but in short:
This is problematic with storage systems that exhibit read-after-write consistency (eg, S3) where updates may still return stale reads for some time. So we can't really expect the full manifest to be immediately available after each segment. (In the current iteration the filesystem storage driver, we avoid writing manifests to external OS for this reason.) Not to mention that appending to a non-windowed live HLS playlist still means rewriting the entire playlist from top to bottom, which is asymptotically horrible without serious tweaking.
The easy way around this would be to just write a manifest once after the job terminates, as the recording is being finalized.
Should we "load" existing segments from a manifest if we find it in the OS when we start handling a new stream?
Do you mean, should we append to existing playlists?
That is a separate product question, but a lot of these issues have been raised earlier in issues such as https://github.com/livepeer/go-livepeer/issues/869 - might be good to narrow down the discussion there.
How should we handle non mpeg-ts output? Uploading a bunch of MP4 files somewhere sounds fine, but what about manifests?
This is another recording-specific issue but MP4s and non-segmented formats pose some specific challenges (and we have users that are already using MP4; the existing recorder outputs MP4)
Users expect MP4s to be concatenated and we can't efficiently do that coming from segmented sources. Even with fmp4 and something like the AWS multipart upload API (does an equivalent exist for other cloud OS services?), there is still a "completion" step where the upload is finalized.
There's a few ways this finalization issue is going to present itself:
One way around this is to have a "fixup" step at startup. For MP4s in pull mode, we could inspect the (most likely, filesystem) OS at node startup and attempt to regenerate a recording from leftover segments that weren't correctly cleaned up.
For broadcasters with non-persistent volumes or kube-style workloads that bounce around, an external OS might be necessary. But there are a lot of other gotchas here with respect to reconstructing that state - don't have any firm suggestions right now.
Reading through @j0sh's comments, I should clarify that for all of these use cases I was imagining the use of an external OS.
Again, this probably really belongs in a separate spec for the recording feature, rather than OS
True. I wanted to have a single spec as a starting point to serve as the path forward for all the tickets and PRs I listed, but once we get to a vague consensus as to what's a good idea I can break this down into a variety of smaller specs and tickets.
This is problematic with storage systems that exhibit read-after-write consistency (eg, S3) where updates may still return stale reads for some time. So we can't really expect the full manifest to be immediately available after each segment. (In the current iteration the filesystem storage driver, we avoid writing manifests to external OS for this reason.)
That's okay — nobody's expecting to be able to play back from this OS instantly and get good results. (In the workflow I'm imagining, anyway.) VOD playback can come in later after the consistency has had time to settle. For the recording feature, we just need an archive.
Not to mention that appending to a non-windowed live HLS playlist still means rewriting the entire playlist from top to bottom, which is asymptotically horrible without serious tweaking.
Horrible why? So the manifests are this over and over...
#EXTINF:2.000,
/stream/cf082c5f-2c25-4be5-9c33-6719601b567b/480p/3081.ts
That's 72 bytes 1800 segments per hour 5 renditions * let's say a 48 hour stream = 16 megabytes per second of manifest pushing to keep updating at that point. Okay, yeah. That's horrible. Let's figure something else out.
The easy way around this would be to just write a manifest once after the job terminates, as the recording is being finalized.
We just lost the VODs on every server where a network connection drops or the kernel panics. 😞
How about we upload a series of timestamped manifests, resetting periodically? Manifests could get pushed to {os}/record/{rendition}/{manifest id}-{timestamp}.m3u8
. If I run with -recordManifestSegments 1800
, we roll over to a fresh manifest every 30 minutes. Using the same assumptions as above, that'd be 64k per manifest, not too bad to push that every segment. Then the "finalization" step consists of enumerating and combining all of the timestamped manifests, simple enough.
If recording is enabled, but no recording OS specified, then use the default OS. Maybe under a default prefix, eg "/recordings/".
Sounds great. Maybe /vods
.
Do you mean, should we append to existing playlists? That is a separate product question, but a lot of these issues have been raised earlier in issues such as #869
I hadn't seen https://github.com/livepeer/go-livepeer/issues/869, awesome! I'll read through that discussion and factor out full-playlist related issues over there.
nobody's expecting to be able to play back from this OS instantly and get good results.
This might still be an issue if we need to reconstruct a final playlist during the finalization step, especially because the settlement period is rather indefinite.
We just lost the VODs on every server where a network connection drops or the kernel panics.
Not necessarily - see the later notes about having a "fixup" step. We'll have the segments somewhere, and can reconstruct some semblance of sequentiality from there.
But when multiple broadcasters are involved, I'm also concerned about ordering inconsistencies - timestamps aren't always reliable (whether wall clock timestamps, or taken from within the stream). Sequence numbers might work as a preliminary heuristic, but I don't know if these are always reliable coming from Mist (and was kind of hoping to get rid of client-supplied sequence numbers at some point).
Note that this becomes a lot easier if Livepeer RTMP ingest or node-first is used, because the stream is a lot less likely to be bouncing around nodes. Otherwise it's going to take a while to fully work out the kinks here.
How about we upload a series of timestamped manifests, resetting periodically ... we roll over to a fresh manifest every 30 minute
Hmm, if we're overwriting an existing manifest within that 30 minute span, then what does rolling over buy us?
One concern I have here and (and in general, with non-segmented formats such as MP4) is that this will leave a lot of intermediate file clutter. That should be cleaned up. For external OS, that also means incurring a bunch of priced requests, egress, delete permissions, etc. All that will need to be carefully managed.
Sequence numbers might work as a preliminary heuristic, but I don't know if these are always reliable coming from Mist (and was kind of hoping to get rid of client-supplied sequence numbers at some point).
They are over the scope of a single retryable RTMP stream. Good enough to get started.
How about we upload a series of timestamped manifests, resetting periodically ... we roll over to a fresh manifest every 30 minute
Hmm, if we're overwriting an existing manifest within that 30 minute span, then what does rolling over buy us?
Well, these are full #869 manifests, and we refresh them with every segment, so that means after 48 hours we'd be pushing ~32 megs of manifest every two seconds. So it buys us a lot of bandwidth.
The workflow works like this:
Each broadcaster uploads to /{manifest id}-{timestamp}-{nonce per broadcaster}.m3u8
. The timestamp resets every half an hour on every broadcaster. Let's say for whatever reason there's a lot of retrying and manifestId foo
is bouncing around between three broadcasters or something. After 90 minutes, we might have files that look like this:
foo-0-123.m3u8 # broadcaster A
foo-0-456.m3u8 # broadcaster B
foo-0-789.m3u8 # broadcaster C
foo-1800-123.m3u8 # broadcaster A
foo-1800-456.m3u8 # broadcaster B
foo-1800-789.m3u8 # broadcaster C
foo-3600-123.m3u8 # broadcaster A
foo-3600-456.m3u8 # broadcaster B
foo-3600-789.m3u8 # broadcaster C
Maybe there should be leading zeroes in there for lexical ordering, but that's the idea. Then anyone looking at the OS that wants to play back foo
can enumerate these and combine 'em.
One concern I have here and (and in general, with non-segmented formats such as MP4) is that this will leave a lot of intermediate file clutter. That should be cleaned up. For external OS, that also means incurring a bunch of priced requests, egress, delete permissions, etc. All that will need to be carefully managed.
Not a lot of clutter. In the worst-case example I just gave, that's nine manifest files that correspond to 2700 segment files! Not to mention they're tiny compared to the video — this is an acceptable amount of metadata to facilitate what we're discussing here.
Refactored out the OS syntax proposal to https://github.com/livepeer/go-livepeer/issues/1572, please take discussion for that issue over there ✌️
They are over the scope of a single retryable RTMP stream.
Couple cases I can think of where segment numbers aren't exactly reliable:
For direct RTMP ingest:
For HTTP push or node-first ingest:
Note that for orchestrators using broadcaster-supplied object storage, we have them write into a random prefix in order to prevent front-running or overwriting other transcoders. The scheme is something like this:
/manifestID/nonce/rendition
We might want multiple broadcasters to do something similar, especially with cloud storage. This allows us to distinguish segments on the basis of upload time or whatever, and use that as another heuristic in addition to the sequence number.
Not a lot of clutter.
Anytime the user wants a concatenated file - eg, a proper MP4, which people are already using - or something byte-range addressable (because single blobs are a lot easier to manage), then we'll have a lot of intermediate files to clean up after. It's unavoidable to have intermediate files somewhere, but it feels a bit weird to be using cloud storage as that intermediate layer. But maybe unavoidable for now.
Then anyone looking at the OS that wants to play back foo can enumerate these and combine 'em.
With eventual consistency the settlement period is indefinite. And because it's indefinite, we can't really be sure when the final version of the manifest(s) would be ready. And we still need a finalization step anyway to combine everything.
All this is leading me to think that for "fixup" approach is the way to go here. The first version of an object will be available immediately after writing completes for the first time, but we don't know when the final version of an object settles.
Rather than try to gather manifests and take a guess at when they'll be finalized, just gather a list of the segments that are available, and build a playlist from that. Segments only need to be written once. Again, there are API and egress costs with cloud storage which makes the whole thing a bit non-ideal, but at least it'll work reliably without fundamentally unresolvable edge cases.
Maybe /vods
Slight preference for /recordings
still, since in my head a VOD is the thing that is being transcoded, while a recording is the output of a VOD or a live transcode. (But I get how we could be producing a VOD of a live transcode.)
RTMP stream disconnects, and reconnects to Node B.
It will be new 'recording session'
Node A receives a segment from the source. Takes a long time to complete. Source decides to retry with node B. Node A eventually completes.
Node A shouldn't write anything to the OS in that case.
Also, for HTTP push it is not clear when to do finalisation step.
They are over the scope of a single retryable RTMP stream.
Couple cases I can think of where segment numbers aren't exactly reliable:
I was referring to Mist's behavior from a single server, but point taken.
- The same segment from both node A and node B are written. Which one to choose?
In that case, both overwriting and not overwriting is completely acceptable; it oughta be the same content.
I certainly get that there are a lot of cases where sequence number can't be relied on. We've got folks using Livepeer for VOD transcoding where they only ever do one file per stream and whatnot. But we do have a case, with MistProcLivepeer, where we generally can rely on sequence numbers, and I'd like recording to work in that instance.
Note that for orchestrators using broadcaster-supplied object storage, we have them write into a random prefix [...] We might want multiple broadcasters to do something similar, especially with cloud storage. This allows us to distinguish segments on the basis of upload time or whatever, and use that as another heuristic in addition to the sequence number.
I'd be cool with that — could be very similar to the naming scheme I proposed for manifests, something like /{manifest id}-{sequence number}-{timestamp}-{nonce per broadcaster}.m3u8
. Or a timestamp with sufficient resolution that collisions are unlikely.
All this is leading me to think that for "fixup" approach is the way to go here. The first version of an object will be available immediately after writing completes for the first time, but we don't know when the final version of an object settles.
Rather than try to gather manifests and take a guess at when they'll be finalized, just gather a list of the segments that are available, and build a playlist from that.
Yeah, this feels like the most robust solution. Actually, if segments themselves contain timestamps, then "fixup" could be reading through the existing segments and building a manifest for "now". If there are more segments later, could use the previous manifest as a cache and add the new segments.
How do we handle segment metadata? Specifically, duration? That's really the only piece of information that I wanted from a manifest.
Slight preference for
/recordings
still, since in my head a VOD is the thing that is being transcoded, while a recording is the output of a VOD or a live transcode. (But I get how we could be producing a VOD of a live transcode.)
I'm cool with /recordings
So... manifest/metadata/finalization conversation is ongoing but we've got a couple different workable ideas. As a starting point, everyone cool with doing the following?
-record=true
parameter. Default false./{manifest id}/recordings/{rendition name}/{sequence number}-{timestamp}.ts
That's it. Once that exists and works we can revisit how and when to convert that big mess of segments into something more coherent. @j0sh @darkdarkdragon good starting point?
RTMP stream disconnects, and reconnects to Node B. It will be new 'recording session'
@darkdarkdragon And therein lies problem. It's a recording session but two nodes think they are responsible for recording.
Also, for HTTP push it is not clear when to do finalisation step.
Naively, finalize when the session expires. But with both RTMP and HTTP push, there's still issue with finalization if multiple nodes end up handling the stream at some point. Do we take the hit of potentially "finalizing" multiple times?
It's better to have a single source of ground truth with a separate recording service.
Node A shouldn't write anything to the OS in that case.
We can't say that for certain. Point is, these things will happen with distributed systems; we can't expect zero overlap in sequence numbering.
Actually, if segments themselves contain timestamps
@iameli Probably not necessary because the OS metadata itself will have a last modified date or some equivalent information.
How do we handle segment metadata? Specifically, duration?
OS metadata. For filesystem storage, that's probing. For S3, it's the x-amz-meta
header (HEAD request). This part of why fixups are non-ideal with cloud storage.
All that just points towards fixups being a last resort, and I think we should work towards having a single source of ground truth for recording, eg a separate service.
No change to OS syntax yet, but we do add a -record=true parameter
Record... what? MP4? HLS? Different users want different things. MP4 already works and is in use. At the very least, we'll need to indicate the desired format. I've also had requests to exclude certain renditions. All this needs to be spec'd out.
and saved in a subdirectory, like /{manifest id}/recordings/{rendition name}/
This needs to be thought through some more and spec'd out further. Because if we're already using persistent object storage, then the /recordings/
directory is probably only needed for the finalized artifacts. (And I was thinking it'd be a top level directory, rather than a subdir under /{manifest id}
which gets wiped out after each session in the current filesystem OS implementation to remove intermediate files. Otherwise we have to special-case a bunch of things in a way that doesn't really compose well.)
Otherwise we're incurring a double store to basically the same location as compared to the existing OS behavior. Eg, I have a S3 OS for the primary store. Do I need to re-upload each segments to a /recording
subdir [1] ? Or if I'm recording, could the recording endpoint also be used as the primary OS for the transcoding flow? (Basically, the only time a separate recording OS makes sense is if the destination differs from the one used for the transcoding flow. Even then, there are concerns around intermediate file clutter I think we can avoid.)
[1] TBH, doing some sort of "move" operation is probably still necessary, especially since orchestrators won't always upload to the most straightforward paths, but is it essential to do that in the middle of the transcoding flow? Mid-flow adjustments might make sense for a recording service, but less so for a broadcast node trying to maximize transcoding throughput. Or is that something that could wait until the finalization phase, especially for concatenated formats?
@j0sh
RTMP stream disconnects, and reconnects to Node B. It will be new 'recording session'
@darkdarkdragon And therein lies problem. It's a recording session but two nodes think they are responsible for recording.
What problem? The moment RTMP disconnects - it is 'finalisation' moment, no more recording for Node A. Node B starts from scratch, what Node B will record will be separate piece of video (VOD .m3u8 playlist at the moment).
Otherwise we're incurring a double store to basically the same location as compared to the existing OS behaviour.
Yes. Whole 'record' mode was brought up just not to deal with all the flow of usage of external OS in the transcoding process.
Record... what? MP4? HLS?
Just record everything passing through node, without any transformations. Putting transformations/finalisations into broadcaster node will hurt performance (if, says, node will be doing 'cleanup' on the startup - that means that after the crash node will node be able to restart quickly).
The moment RTMP disconnects - it is 'finalisation' moment, no more recording for Node A. Node B starts from scratch
What if node A doesn't cleanly terminate? What if there's a transient break in the connection, and the client reconnects, and the load balancer redirects the stream to node B? The user is just expecting one stream from us. Same issue if HTTP push were to use multiple broadcasters.
Yes. Whole 'record' mode was brought up just not to deal with all the flow of usage of external OS Just record everything passing through node, without any transformations
This is where the misunderstanding is, I think because this spec is really incomplete.
Right now, there is a recording mode that will produce a MP4 output as part or the pull branch. This works well, and node-first users are using it, including someone in production. (No, I didn't recommend that, but they were in a rush. And yes, is something that has been specifically requested by several users.)
Here, there's no mention of the actual recording mode(s), for example. Only a subtext that it's just HLS. But that's clearly insufficient.
If the goal was just to "pass everything through the node and into OS" then we already have plain object storage support. But that's clearly not the only goal. Most of the issues here revolve around finalizing that result - into either a playlist or a concatenated file.
Putting transformations/finalisations into broadcaster node will hurt performance
I agree, which is why for the hosted service, we should run the node as a non-transcoding "recording service". Described a number of approaches here https://github.com/livepeer/go-livepeer/issues/869#issuecomment-657797072
There has been a lil bit of confusion and evolution about the requirements for the recording feature for the UGC launch. I know some of this is repetition from discord and video chats, but to be super explicit, these are the requirements:
Post launch, we will need to continue iterating on the recording feature. Some ideas we will want to prioritize after launch are:
Abstract
Users want to record streams. Our existing object stores get us most of the way there; let's make them a bit more generic and then implement the specific features needed for this case.
Prior Art
This spec is an attempt to unify a few different OS-related issues:
API stream recording https://github.com/livepeer/livepeerjs/issues/792
S3-compatible object store support https://github.com/livepeer/go-livepeer/pull/1206
Upload failed transcodes https://github.com/livepeer/go-livepeer/pull/1398
Webhook-configured per-stream object stores https://github.com/livepeer/go-livepeer/issues/1264 https://github.com/livepeer/livepeerjs/pull/518
And it draws inspiration from @j0sh's node-first proposal and filesystem OS work:
https://github.com/livepeer/go-livepeer/commit/5b776565debc43256a1ad235896f89faebd79a86 https://github.com/livepeer/go-livepeer/commit/0b455a18053200c2f6b7e1f5139af5ba4d02328d
Motivation
There are a few different situations where we'd want to interact with object stores.
We currently have 1 and part of 2.
Three-part proposal. Part 1 proposes updating the syntax for OSes so they can be referred to in a more generic way. Part 2 discusses the corresponding webhook syntax that allows customization of OSes on a per-stream basis. Part 3 discusses a new type of OS, the
-recordObjectStore
, that behaves in a manner suitable for recording livestreams.Proposed Solution
TLDR: Change OS to have the form
s3://ACCESS_KEY_ID:SECRET_ACCESS_KEY@eu-central-1/testbucket
. Those get called by-objectStore
,-failObjectStore
, and-recordObjectStore
. Implement per-stream webhook OS configuration.Part 1: Refactor OS syntax into one unified URL-based option.
EDIT: Refactored out to https://github.com/livepeer/go-livepeer/issues/1572, please take discussion there.
Part 2: Allow for webhooks to customize OS on a per-stream basis
Utilizing the above, the syntax for this becomes very easy:
That's not to say the implementation will be easy, but I think the spec is pretty clear.
(Currently
-failObjectStore
is only implemented for OTs, but it would be a good feature to have on Bs as well.)Part 3: Record Mode
The above will allow for customizability of OS but it doesn't address the "recording" use case, as discussed above. So, another CLI parameter and webhook field:
The behavior of streams with
-recordObjectStore
enabled is as follows:Implementation Tasks and Considerations
These will become their own tickets and would probably be carried out by different people.
recordObjectStore
Testing Tasks and Considerations
Certainly there should be tests of per-stream OS and whatnot but I don't think there are any unique testing challenges here.
Known Unknowns
How do we handle manifest recording when streams switch broadcasters mid-broadcast? Is there a node-first answer to this question, or must it be handled on the infrastructural side? Should we "load" existing segments from a manifest if we find it in the OS when we start handling a new stream? Wouldn't that break everything if we stream to the same manifestID twice in a row? What about gaps in sequence number and whatnot?
The above means I kinda want the ability to send media and manifests to different endpoints? The media gets stored; the manifests get parsed and combined as appropriate? I dunno.
EDIT: How should we handle non mpeg-ts output? Uploading a bunch of MP4 files somewhere sounds fine, but what about manifests?
Alternatives
EDIT: Moved to https://github.com/livepeer/go-livepeer/issues/1572