Open kaihendry opened 7 years ago
Having a quick look at the RFC https://tools.ietf.org/html/draft-pantos-http-live-streaming-23#section-4.3.4.3
The EXT-X-I-FRAME-STREAM-INF tag identifies a Media Playlist file
containing the I-frames of a multimedia presentation. It stands
alone, in that it does not apply to a particular URI in the Master
Playlist.
So it doesn't appear that EXT-X-STREAM-INF and EXT-X-I-FRAME-STREAM-INF need to be together.
They do in a practical sense. Use case: Provide this user/client with the top two renditions, lets say 1080p & 720p. The response ideally should have both #EXT-X-STREAM-INF and #EXT-X-I-FRAME-STREAM-INF, so just slicing the []Variants might not work unless they maintained the order/grouping. See the RFC's 8.5. Master Playlist with I-Frames section which I was unable to work out to hyperlink.
I can understand your use case might want to have them related, but as this is library provides low-level access to reading and writing HLS streams, we won't couple these two together.
I.e. Because the HLS RFC treats these separately, this library will continue to do the same. However, we wouldn't exclude be open to the idea of providing higher level abstractions, but we just haven't had the use case nor pull requests to support it.
If you have thoughts and a clear API on how this could be achieved, I'd think we'd be open to potentially merging in those abstractions, but it's not there because they are two different unrelated attributes according to the RFC.
In the meantime I am maintaining an order and slicing evenly, 2,4,6 to ensure the i-frame counterparts are together. I will think how to do it, though I must concede I am bit of a golang newb.
This is kind of an old thread but it seemed to be the only place I could find through Google searches where there was somebody who seems to know anything about the subject I'm curious about. So I am throwing myself upon your mercy & hoping you can explain this for me. I need to clarify I am just a user, not looking to do any programming nor to build any web sites that offer streaming media. My favorite site to use is the Metropolitan Opera. The m3u8 files in their On Demand area for their free nightly streams always have statements in pairs:
To my layman's eye, these 2 statements always describe the exact same stream. My question is simply why are there both statements in the manifest? I've been using ffmpeg to download the non-I-FRAME streams & they seem to be perfectly fine once I get them. I've never tried to download the I-FRAME streams, partly because they look like duplicates, partly because I don't know what they give me that the other streams do not. Why do these I-FRAME streams exist? How do they differ from the other streams? What do people who construct web sites use them for? How do they decide when to use one or the other? In the case of the Met, they have 2 different places where you can get their nightly free streams. The manifests in one of those places consistently do not have the I-FRAME streams, while the manifests in the other place (their On Demand area) consistently do have these pairs of streams with one member of each pair being one of these I-FRAME things. I do hope you can enlighten me. If you like, I can post sample m3u8 files here to perhaps clarify what I'm looking at.
Thank you for answering me . . . he said, being a great optimist & hoping somebody will answer me.
Skipping from my understanding, the purpose of EXT-X-I-FRAME-STREAM-INF.
Essentially, an i frame contains all the information for a video frame. When playing a video, it’s made up of differences between each frame. This works well when playing a video from start to finish. You start a video, and compression works well by playing a frame and then storing and playing back the difference between each frame. Most scenes vary by very little. I’m completely glossing over so many facts here.
But what if someone wants to skip through certain parts of a video, but see what happens at each part? If someone skips between minute 1, to 2, to 3, then they’d need all the frames from start to end in order to recreate a single frame.
So, in order for them to support someone to skip through a video and show the frame immediately, this required an index of each frame that contains all the information. An I frame (not a p frame etc). So this tag notes all the frames which are I frames.
Glossing over so many details here, but hoping this’d answer your question quickly so you can Google more with some context.
See: https://en.m.wikipedia.org/wiki/Video_compression_picture_types https://developer.apple.com/documentation/http_live_streaming/hls_authoring_specification_for_apple_devices
On Fri, 2 Apr 2021 at 9:02 pm, GrampaWildWilly @.***> wrote:
This is kind of an old thread but it seemed to be the only place I could find through Google searches where there was somebody who seems to know anything about the subject I'm curious about. So I am throwing myself upon your mercy & hoping you can explain this for me. I need to clarify I am just a user, not looking to do any programming nor to build any web sites that offer streaming media. My favorite site to use is the Metropolitan Opera. The m3u8 files in their On Demand area for their free nightly streams always have statements in pairs:
EXT-X-STREAM-INF
EXT-X-I-FRAME-STREAM-INF
To my layman's eye, these 2 statements always describe the exact same stream. My question is simply why are there both statements in the manifest? I've been using ffmpeg to download the non-I-FRAME streams & they seem to be perfectly fine once I get them. I've never tried to download the I-FRAME streams, partly because they look like duplicates, partly because I don't know what they give me that the other streams do not. Why do these I-FRAME streams exist? How do they differ from the other streams? What do people who construct web sites use them for? How do they decide when to use one or the other? In the case of the Met, they have 2 different places where you can get their nightly free streams. The manifests in one of those places consistently does not have the I-FRAME streams, while the manifests in the other place (their On Demand area) consistently do have these pairs of streams with one member of each pair being one of these I-FRAME things. I do hope you can enlighten me. If you like, I can post sample m3u8 files here to perhaps clarify what I'm looking at.
Thank you for answering me . . . he said, being a great optimist & hoping somebody will answer me.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grafov/m3u8/issues/91#issuecomment-812482990, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN74UPFYSKLIRBJVCG4AJ3TGWP5LANCNFSM4DO2NQGA .
-- Bradley Falzon @.***
Thank you so much for responding, Bradley.
Before I read the references you provided, I tried an experiment mainly to just satisfy my own curiosity. I downloaded one of the I-FRAME streams for what was at that moment the current available Metropolitan Opera free performance. It was first on TV in 1988 so it's not in HD, meaning it's a lot smaller file than their more modern offerings. The first thing I noticed in the ffmpeg log is that ffmpeg had to read 24 chunks before it could display its little stream description that includes the duration, resolution, frame rate, you know, all that stuff. Compare this with the usual thing I see, which is that ffmpeg reads 2 chunks of a stream & that's all it needs before it can report those things. This particular stream consisted of 4012 chunks. It took about 12 minutes to download & I got a file of 197M that claimed to be of resolution 720x486, duration 2:13:52, all about as expected, although the file size seemed a bit small to me. Also as expected, it had a video track but no audio track. The Met makes these shows available as separate audio & video files, so this is normal. The curious thing is it claimed to have a frame rate of 0. Despite that, I went ahead & tried to play it in VLC. I got a black screen. Skimming through the file in VLC, it appears to be a black screen for the whole 2+ hours of the file. It so happens that they offered this same performance as a free stream one night last July & I got it then. The video file is actually 1.21G, a hair larger than 197M. So whatever this is, it isn't a playable video.
Your idea that the I-FRAME stream is like an index that allows a user to skip around in the stream sounds reasonable. On the other hand, the other place on the Met web site where the same performance was currently available does not have I-FRAME descriptors & despite that, you can skip around in that player. So even though your idea sounds reasonable, I'm a bit doubtful. Is there a format that separates a video stream into a stream of P frames & a stream of I frames? Like, a normal stream has the P frames & the I frames integrated but there's another format in which the 2 types of frames are separate? If that's the case, I'd expect to see another descriptor in the manifest for the stream of P frames but I don't. The regular #EXT-X-STREAM-INF descriptors in these manifests give playable MP4 files, which leads me to believe they have integrated P & I frames. I've never downloaded an I-FRAME stream before just now & I've never had a problem playing back the operas.
Then I read the Wikipedia article . . . after adjusting the URL you gave to be for the desktop site instead of for the mobile site. That made me realize that the questions I just asked about a stream of all P frames & all I frames are pretty stupid. Given what I read there, a stream of all any kind of frame just isn't something that is done. A stream is a combination of I frames, P frames, and something that hasn't come up in this discussion until now, B frames. My understanding is that to encode a video, you start with a given image, but then you describe differences in that image that give the illusion of motion happening. When you look at movements happening, a lot of what your eyes take in isn't changing moment to moment. So a video is encoded by describing differences moment to moment rather than repainting the whole image in which maybe 90% of it is the same as the moment before. If I understand the concept correctly, a video starts with an I-frame. An I-frame has nothing to do with an index. An I-frame is a complete image. Actually, I could put that in quotation marks because that's what it says in the Wikipedia article. After the I-frame, there follows a sequence of some number of B-frames & P-frames. Then at a certain point there's another I-frame followed by more of the other types of frame. So a video has I-frames scattered throughout it in some pattern, at some frequency. These are complete images that you can skip to & resume playback from. But a stream of all I-frames is kind of a dumb idea. I suppose it would be a worst-case example of a video that couldn't be compressed.
This brings me to the conclusion that I-frames in a stream & the #EXT-X-I-FRAME-STREAM-INF statement in a manifest are not related. They don't deal with the same concept. It also makes me think the #EXT-X-I-FRAME-STREAM-INF statement actually has nothing to do with being able to skip around from one time index to another in a stream player. I can see where the existence of I-frames within a stream would facilitate that, but the #EXT-X-I-FRAME-STREAM-INF statement must have some other purpose. It's probably an unfortunate coincidence that there's something called I-frames in streams & there's a statement called #EXT-X-I-FRAME-STREAM-INF in manifests. Maybe, because HLS is an Apple invention, they call them iFrames, in the same spirit as iPod, iMac, iPhone, etc. & that's how the #EXT-X-I-FRAME-STREAM-INF statement came to exist in manifests. Is our language really so poor that we have to resort to calling utterly different things by the same name? (Rhetorical question.)
Hoping to gain further insight, I moved on to the Apple document. That seems like a really high-level set of rules for constructing a stream to conform to Apple's HLS standard. Like I said before, I'm not in that business. The document didn't particularly move me closer to an understanding of the #EXT-X-I-FRAME-STREAM-INF statement in manifests. But that document has links to other documents that I will, over the next few days, navigate to & see if I can make heads or tails of whatever I find. Maybe I'll also get a clue why the #EXT-X-I-FRAME-STREAM-INF stream I downloaded isn't really playable in VLC. It plays but there's no viewable images there. So what good is it? That's a question I hope to answer with further reading.
For the actual HLS specification refer to https://tools.ietf.org/html/rfc8216
Without this playlist you certainly can skip around, the idea is that a player could show you the frame you’re skipping to a lot quicker - it doesn’t need to download much data (I frame and p frames, when just an iframe would do).
I’m skipping over a lot of details here, again, on mobile and on holiday.
I don’t know exactly how this file would play, I’ve never looked at one myself for a very long time. Perhaps play it in ffplay instead of ffmpeg.
But essentially that directive provides a URL to a file with just i frames, what the player does is up to the player, but they use it to make skipping forward or backward better.
On Sat, 3 Apr 2021 at 1:33 pm, GrampaWildWilly @.***> wrote:
Thank you so much for responding, Bradley.
Before I read the references you provided, I tried an experiment mainly to just satisfy my own curiosity. I downloaded one of the I-FRAME streams for what was at that moment the current available Metropolitan Opera free performance. It was first on TV in 1988 so it's not in HD, meaning it's a lot smaller file than their more modern offerings. The first thing I noticed in the ffmpeg log is that ffmpeg had to read 23 chunks before it could display its little stream description that includes the duration, resolution, frame rate, you know, all that stuff. Compare this with the usual thing I see, which is that ffmpeg reads 2 chunks of a stream & that's all it needs before it can report those things. This particular stream consisted of 4012 chunks. It took about 12 minutes to download & I got a file of 197M that claimed to be of resolution 720x486, duration 2:13:52, all about as expected, although the file size seemed a bit small to me. Also as expected, it had a video track but no audio track. The Met makes these shows available as separate audio & video files, so this is normal. The curious thing is it claimed to have a frame rate of 0. Despite that, I went ahead & tried to play it in VLC. I got a black screen. Skimming through the file in VLC, it appears to be a black screen for the whole 2+ hours of the file. It so happens that they offered this same performance as a free stream one night last July & I got it then. The video file is actually 1.21G, a hair larger than 197M. So whatever this is, it isn't a playable video.
Your idea that the I-FRAME stream is like an index that allows a user to skip around in the stream sounds reasonable. On the other hand, the other place on the Met web site where the same performance was currently available does not have I-FRAME descriptors & despite that, you can skip around in that player. So even though your idea sounds reasonable, I'm a bit doubtful. Is there a format that separates a video stream into a stream of P frames & a stream of I frames? Like, a normal stream has the P frames & the I frames integrated but there's another format in which the 2 types of frames are separate? If that's the case, I'd expect to see another descriptor in the manifest for the stream of P frames but I don't. The regular #EXT-X-STREAM-INF descriptors in these manifests give playable MP4 files, which leads me to believe they have integrated P & I frames. I've never downloaded an I-FRAME stream before just now & I've never had a problem playing back the operas.
Then I read the Wikipedia article . . . after adjusting the URL you gave to be for the desktop site instead of for the mobile site. That made me realize that the questions I just asked about a stream of all P frames & all I frames are pretty stupid. Given what I read there, a stream of all any kind of frame just isn't something that is done. A stream is a combination of I frames, P frames, and something that hasn't come up in this discussion until now, B frames. My understanding is that to encode a video, you start with a given image, but then you describe differences in that image that give the illusion of motion happening. When you look at movements happening, a lot of what your eyes take in isn't changing moment to moment. So a video is encoded by describing differences moment to moment rather than repainting the whole image in which maybe 90% of it is the same as the moment before. If I understand the concept correctly, a video starts with an I-frame. An I-frame has nothing to do with an index. An I-frame is a complete image. Actually, I could put that in quotation marks because that's what it says in the Wikipedia article. After the I-frame, there follows a sequence of some number of B-frames & P-frames. Then at a certain point there's another I-frame followed by more of the other types of frame. So a video has I-frames scattered throughout it in some pattern, at some frequency. These are complete images that you can skip to & resume playback from. But a stream of all I-frames is kind of a dumb idea. I suppose it would be a worst-case example of a video that couldn't be compressed.
This brings me to the conclusion that I-frames in a stream & the
EXT-X-I-FRAME-STREAM-INF statement in a manifest are not related. They
don't deal with the same concept. It also makes me think the
EXT-X-I-FRAME-STREAM-INF statement actually has nothing to do with being
able to skip around from one time index to another in a stream player. I can see where the existence of I-frames within a stream would facilitate that, but the #EXT-X-I-FRAME-STREAM-INF statement must have some other purpose. It's probably an unfortunate coincidence that there's something called I-frames in streams & there's a statement called
EXT-X-I-FRAME-STREAM-INF in manifests. Maybe, because HLS is an Apple
invention, they call them iFrames, in the same spirit as iPod, iMac, iPhone, etc. & that's how the #EXT-X-I-FRAME-STREAM-INF statement came to exist in manifests. Is our language really so poor that we have to resort to calling utterly different things by the same name? (Rhetorical question.)
Hoping to gain further insight, I moved on to the Apple document. That seems like a really high-level set of rules for constructing a stream to conform to Apple's HLS standard. Like I said before, I'm not in that business. The document didn't particularly move me closer to an understanding of the #EXT-X-I-FRAME-STREAM-INF statement in manifests. But that document has links to other documents that I will, over the next few days, navigate to & see if I can make heads or tails of whatever I find. Maybe I'll also get a clue why the #EXT-X-I-FRAME-STREAM-INF stream I downloaded isn't really playable in VLC. It plays but there's no viewable images there. So what good is it? That's a question I hope to answer with further reading.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grafov/m3u8/issues/91#issuecomment-812803578, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN74UMS5Z3FMLAM7C7H5VTTG2EAHANCNFSM4DO2NQGA .
-- Bradley Falzon @.***
Of course! ffplay. Now why didn't I think of that? Brilliant! Give yourself a raise.
I'd never even thought to execute ffplay before. It comes with ffmpeg & ffprobe, both of which I have used, but I've always just used VLC to play videos. So I guessed that I needed to open a command session & run ffplay giving it the name of the MP4 file as input. To no-one's surprise, this worked. It opens a video player window outside the command session but there's a running time index counter in the command session while the video is playing. The video, such as it is, consists of one frame every approximately 2 seconds. From this I conclude that the full performance, which occupies 1.21G as I said, contains an I-frame every 2 seconds, more or less. I imagine this is a choice the Metropolitan Opera has made. Other sites might use a different frequency. Yes? The I-frames evidently occupy 197M of the 1.21G. The Met uses the Brightcove player. As I've been saying, they have the player embedded on 2 different pages, but only one of them (in the On Demand area) presents a manifest that has these #EXT-X-I-FRAME-STREAM-INF statements in it. That manifest also offers more choices of video resolution than the other player. It often shows a couple of streams at a higher resolution than you can get on the other page. And even when they both show the same highest resolution, the On Demand area offers a stream that has a higher BANDWIDTH parameter, which I have always taken to mean it's a better quality video. I have come to understand that resolution alone does not quantify how good a video is. You also have to look at the bit rates. Bit rates are shown in the file Properties dialog on Windows (and in ffprobe results). There is a direct correlation between the file bit rates & the BANDWIDTH shown in a manifest, but I don't know what the conversion formula is to convert one unit of measure to the other . . . or even if such a calculation can be made.
OK. You've convinced me. I stand educated. Thank you. Evidently, the On Demand player can skip around within an opera better than the other player can. I'm not sure how noticeable the improvement might be to a human, but I suppose it must be measurable. Since I just about never stream their shows, I wouldn't be a good candidate to evaluate that. I just download the regular streams & play them in VLC. VLC can skip around in the videos & it doesn't need the help of the I-frame stream. But then, we're talking about 2 entirely different situations. One is the dynamic download in chunks & playback of the chunks of an online stream. This process is at the mercy of the speed & quality of your Internet connection. It makes perfect sense to me that skipping around in the online stream could use some help. The other situation is the playback of a file that is already resident on your system. There can't be any buffering delays in this case. Skipping around is constrained only by the speed of your CPU & disk drives, & I believe it doesn't take a whole lot of hardware speed to do this skipping around.
Once again, thank you. I think I get it.
Typically we need to slice a playlist, and I'm no HLS expert, but I noticed from other examples, it seems important that the EXT-X-STREAM-INF and EXT-X-I-FRAME-STREAM-INF are together, for e.g.
How do people typically do this? Sort on Resolution? https://godoc.org/github.com/grafov/m3u8#VariantParams