ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.47k stars 10.05k forks source link

"--hls-use-mpegts --no-part" should create files with ".ts" extension #11105

Open Vangelis66 opened 8 years ago

Vangelis66 commented 8 years ago

OS: Windows Vista SP2 x86, latest OS updates from Microsoft. Using the latest standalone "youtube-dl.exe" downloaded from Github.

youtube-dl --version => 2016.11.02

Also using FFmpeg.exe 3.0 (x86) downloaded from the Zeranoe repo.

The first channel of the Greek National TV broadcaster uses Youtube to host its live stream; ERT1 LIVE => https://www.youtube.com/watch?v=gnnO4TeJG-4 The actual youtube URL changes frequently... --hls-prefer-native doesn't seem to be able to handle this stream, so ffmpeg MUST be used. When issuing:

youtube-dl -f 94 --no-part "https://www.youtube.com/watch?v=gnnO4TeJG-4"

the stream download is delegated to FFmpeg and (correctly) youtube-dl creates a file with the ".mp4" extension. When I CTRL+C, the end file is correctly identified by MediaInfo to be an MP4 file. On the contrary, when I issue:

youtube-dl -f 94 --hls-use-mpegts --no-part "https://www.youtube.com/watch?v=gnnO4TeJG-4"

youtube-dl (again) generates a file with file extension ".mp4", but the --hls-use-mpegts switch creates an MPEG-TS container which has (by default) a ".ts" file extension (... at least on Windows)! CTRL+Cing results in an end file which is, in essence, a ".ts" file, as reported by MediaInfo, but with an erroneous ".mp4" extension. I have to manually rename the file back to the correct ".ts" extension.

I am of the opinion this is a bugged behaviour on the part of the application and that youtube-dl should be smart enough to attribute the correct file extention (.ts) when --hls-use-mpegts is employed...

Many thanks to all the devs' team for their on-going efforts!

Vangelis66 commented 8 years ago

@jaimeMF

Thanks for reading my report; though, if you ask me, the "bug" label would have been more appropriate...

I am not "requesting" some extra (new) feature of youtube-dl (the case on most issues here), rather pointing out that an existing feature be ironed out! Anyhow, keep up the good work!

ghost commented 7 years ago

@jaimeMF @Vangelis66 I am new to youtube-dl, but I think most or all modules default to "%(title)s-%(id)s.%(ext)s", so instead of just sending --hls-use-mpegts, you should instead send --hls-use-mpegts --output "%(title)s-%(id)s.ts". You will still get good filenames that way.

But I fully agree that this should be done by default... if possible. Because --hls-use-mpegts is an "if" option, which starts downloading m3u8s as mpegts IF the URL contained an m3u8. But if the given URL wasn't detected as m3u8 then I don't want to get a .ts extension. But with my workaround that's what would happen. Something like a .flv file would get a .ts extension instead, if downloaded via my flags.

So for now I have to be sure to only send the .ts/use-mpegts flags when I know the site uses mpegts.

@jaimeMF I think it makes a lot of sense to automatically replace the meaning of "ext" with ".ts" when --hls-use-mpegts is enabled and youtube-dl detects a m3u8 playlist. That way we can use a single command to download any file, and always be sure to get live-playable .ts files when they are livestreams. I hate using .mp4 since its metadata isn't written until the end so if my computer or the app crashes then I can never recover the file since it has no metadata. And only mpegts files can be live-previewed before the download is complete. So I choose to get all livestreams as .ts and then manually convert to a different container later for archival.

diamond12 commented 7 years ago

@SteveJobzniak Or maybe it is easier to replace ext with .ts if ffmpeg is executed with the -f mpegts parameter.

ghost commented 7 years ago

@diamond12 Not sure if that would work. The "external downloader" script that launches ffmpeg says ffmpeg -f mpegts every time you use youtube-dl --hls-use-mpegts, even if the actual stream is not mpegts. See: https://github.com/rg3/youtube-dl/blob/master/youtube_dl/downloader/external.py#L264

And https://github.com/rg3/youtube-dl/blob/master/youtube_dl/downloader/hls.py is another HLS downloader which would need fixing too.

I propose a possible solution: If the youtube-dl extractor detects a .m3u8 file AND --hls-use-mpegts is passed as a parameter (basically what the linked line of code already checks for above), then rewrite %(ext)s to "ts". That means that anytime "ext" is used in the output filename pattern, it would correctly use ".ts" - IF - the stream is HLS and is being saved as mpegts.

Either way, this is a pretty silly problem in youtube-dl. Its "automatic filename" feature definitely shouldn't be applying a totally incorrect file extension! :-O It's not a "feature request". It's a necessary bugfix.

diamond12 commented 7 years ago

@SteveJobzniak I agree it's a bug and I don't know why it is marked as a feature request.

The source that you pointed me to is right after if protocol in ('m3u8', 'm3u8_native') so it should be fine. Besides, -f mpegts will always produce a .ts. But yes, your proposed solution is a good option.

ghost commented 7 years ago

@diamond12 Oh yeah you are right! So they do check that it's a HLS stream before applying --hls-use-mpegts. Nice find! That should mean it's possible to always run with --hls-use-mpegts on the command line even if the URL isn't HLS. And that would support both regular downloads and HLS streams with a single command.

All that's missing now is applying the correct output extension (.ts) whenever it detects a HLS stream.

I think correcting the meaning of ext to ts before the output filename pattern is parsed (as described above) makes the most sense. That way it fixes the ext portion of the filename everywhere in the code that tries to look at / use the extension, and even in any user-provided filename patterns on the command-line (if the user uses %(ext)s.

ghost commented 7 years ago

@jaimeMF I am extremely overworked, but took a minute to look at the source to get things rolling. It seems the function process_video_result is the correct injection point:

https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L1363

It loops through all possible video sources and fills in their 'ext' value if missing. And then it also runs determine_protocol (which sets the value to m3u8 or m3u8_native?) if the 'protocol' value is missing.

The determine_protocol code is in utils:

https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py#L2323

It seems like all of this extension/protocol/URL-determining code is done without connecting to the server to check any mimetype. It looks like any advanced detection (such as a ".m3u8" hiding behind a ".php" URL), is up to that website's extractor plugin to put in the returned protocol value (however, that "protocol" value isn't provided by many extractors since it isn't necessary at all if the URL itself already starts with a clear protocol like rtmp:// or ends in a clear extension like .m3u8). If the protocol isn't provided by the extractor, YouTube-DL guesses it (https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L1366). So it looks like YouTube-DL doesn't connect any further to the URL it's given by the extractor. It just fills in any missing fields (like protocol) and then passes everything to the desired downloader method. If a website hides its .m3u8 file behind a ".php" extension script, then it'd be up to the extractor to set protocol = m3u8.

Since it seems like YouTube-DL hasn't got any responsibility to further verify the protocol, it seems like modifying the extension selector in YoutubeDL.py's process_video_result is the correct location for the bugfix, from what I can see:

That would fix the bug. Please comment to let us know your thoughts. Is there a better location to fix the bug? Did I find the right location? Could you please fix this bug without a pull request? I need to go now.