Cannot download multiple formats with archive

dundua commented 9 years ago

I have an issue where downloading multiple file formats from a YouTube video will only download the first set of videos. For example, if a format selection is "-f 140+136,171+247" with the archive on with "--download-archive archive.txt", youtube-dl will only download formats 140+136. It will say that the video is already in the archive, and not download formats 171+247.

Example log

[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--verbose', u'--ignore-errors', u'--no-continue', u'--no-overwrites', u'--keep-video', u'--no-post-overwrites', u'--download-archive', u'archive.txt', u'--write-description', u'--write-info-json', u'--write-annotations', u'--write-thumbnail', u'--all-subs', u'--output', u'%(uploader)s (%(uploader_id)s)/%(id)s/%(title)s - %(upload_date)s.%(ext)s', u'-f', u'bestvideo[ext=mp4]+bestaudio[ext=m4a],bestvideo[ext=webm]+bestaudio[ext=webm]', u'https://www.youtube.com/watch?v=Mk9yk7B-cNA']
[debug] Encodings: locale UTF-8, fs UTF-8, out None, pref UTF-8
[debug] youtube-dl version 2015.09.28
[debug] Python version 2.7.3 - Linux-3.2.0-4-686-pae-i686-with-debian-stretch-sid
[debug] exe versions: none
[debug] Proxy map: {}
[youtube] Mk9yk7B-cNA: Downloading webpage
[youtube] Mk9yk7B-cNA: Downloading video info webpage
[youtube] Mk9yk7B-cNA: Extracting video information
WARNING: video doesn't have subtitles
[youtube] Mk9yk7B-cNA: Searching for annotations.
[youtube] Mk9yk7B-cNA: Downloading DASH manifest
[youtube] Mk9yk7B-cNA: Downloading DASH manifest
[info] Mk9yk7B-cNA: downloading video in 2 formats
[info] Writing video description to: Smosh Olivia (UCeaMWfo8kwdbLIl0RyuwVoA)/Mk9yk7B-cNA/First Live Stream with me i am a sandwich - 20150921.description
[info] Writing video annotations to: Smosh Olivia (UCeaMWfo8kwdbLIl0RyuwVoA)/Mk9yk7B-cNA/First Live Stream with me i am a sandwich - 20150921.annotations.xml
[info] Writing video description metadata as JSON to: Smosh Olivia (UCeaMWfo8kwdbLIl0RyuwVoA)/Mk9yk7B-cNA/First Live Stream with me i am a sandwich - 20150921.info.json
[youtube] Mk9yk7B-cNA: Downloading thumbnail ...
[youtube] Mk9yk7B-cNA: Writing thumbnail to: Smosh Olivia (UCeaMWfo8kwdbLIl0RyuwVoA)/Mk9yk7B-cNA/First Live Stream with me i am a sandwich - 20150921.jpg
WARNING: You have requested multiple formats but ffmpeg or avconv are not installed. The formats won't be merged.
[debug] Invoking downloader on 'https://r1---sn-hp57knls.googlevideo.com/videoplayback?id=324f7293b07e70d0&itag=136&source=youtube&requiressl=yes&mn=sn-hp57knls&pl=47&mv=m&ms=au&mm=31&gcr=us&nh=IgpwcjA0Lm1pYTA0KgkxMjcuMC4wLjE&ratebypass=yes&mime=video/mp4&gir=yes&clen=286166677&lmt=1443080099230244&dur=1892.833&mt=1443597092&upn=35XrcrsV40Y&key=dg_yt0&sver=3&fexp=9405975,9408710,9409069,9410705,9412927,9413140,9415365,9415435,9415485,9416023,9416126,9416729,9417098,9417707,9418094,9418153,9418400,9418411,9418438,9418448,9418802,9419444,9419488,9420348,9421013,9421196,9421501,9421890&signature=3B7FE0E3FB97C3EDF14CDC6B141A02319068560C.7B303E7F49D21C7C1F05634A78379EBA1316006B&ip=2602:306:bdc6:98e0:e0a3:581e:1e92:b31d&ipbits=0&expire=1443618762&sparams=ip,ipbits,expire,id,itag,source,requiressl,mn,pl,mv,ms,mm,gcr,nh,ratebypass,mime,gir,clen,lmt,dur'
[download] Destination: Smosh Olivia (UCeaMWfo8kwdbLIl0RyuwVoA)/Mk9yk7B-cNA/First Live Stream with me i am a sandwich - 20150921.f136.mp4
[download] 100% of 272.91MiB in 04:19
[debug] Invoking downloader on 'https://r1---sn-hp57knls.googlevideo.com/videoplayback?id=324f7293b07e70d0&itag=140&source=youtube&requiressl=yes&mn=sn-hp57knls&pl=47&mv=m&ms=au&mm=31&gcr=us&nh=IgpwcjA0Lm1pYTA0KgkxMjcuMC4wLjE&ratebypass=yes&mime=audio/mp4&gir=yes&clen=30063595&lmt=1443078187650282&dur=1892.890&mt=1443597092&upn=35XrcrsV40Y&key=dg_yt0&sver=3&fexp=9405975,9408710,9409069,9410705,9412927,9413140,9415365,9415435,9415485,9416023,9416126,9416729,9417098,9417707,9418094,9418153,9418400,9418411,9418438,9418448,9418802,9419444,9419488,9420348,9421013,9421196,9421501,9421890&signature=3D0FB255F932920EE3C2A7604C992E2ADC0B2938.04B0BE30B6BC871184F0496A96E1296DB4E4807A&ip=2602:306:bdc6:98e0:e0a3:581e:1e92:b31d&ipbits=0&expire=1443618762&sparams=ip,ipbits,expire,id,itag,source,requiressl,mn,pl,mv,ms,mm,gcr,nh,ratebypass,mime,gir,clen,lmt,dur'
[download] Destination: Smosh Olivia (UCeaMWfo8kwdbLIl0RyuwVoA)/Mk9yk7B-cNA/First Live Stream with me i am a sandwich - 20150921.f140.m4a
[download] 100% of 28.67MiB in 00:25
[download] First Live Stream with me i am a sandwich has already been recorded in archive

Gorrrg commented 4 years ago

Seems like this issue has re-emerged. I'm downloading one video format and two audio formats from YouTube in seperate files and when the video file finished it will skip the two audio downloads.

D:\Video\Mindfield>youtube-dl.exe --output "%(uploader)s - %(upload_date)s - %(title)s (%(id)s).%(format_id)s" --format "313,258,251" --add-metadata --write-description --write-thumbnail --limit-rate 3M --ignore-errors --download-archive downloaded.txt -i PLZRRxQcaEjA4qyEuYfAMCazlL0vQDkIj2
[youtube:playlist] PLZRRxQcaEjA4qyEuYfAMCazlL0vQDkIj2: Downloading webpage
[download] Downloading playlist: Mind Field : Season 1
[youtube:playlist] playlist Mind Field : Season 1: Downloading 8 videos
[download] Downloading video 1 of 8
[download] Isolation - Mind Field (Ep 1) has already been recorded in archive
[download] Downloading video 2 of 8
[download] Conformity - Mind Field (Ep 2) has already been recorded in archive
[download] Downloading video 3 of 8
[youtube] zD68reVP0Ek: Downloading webpage
[youtube] zD68reVP0Ek: Downloading video info webpage
[info] zD68reVP0Ek: downloading video in 3 formats
[info] Writing video description to: Vsauce - 20170125 - Destruction - Mind Field (Ep 3) (zD68reVP0Ek).313.description
[youtube] zD68reVP0Ek: Downloading thumbnail ...
[youtube] zD68reVP0Ek: Writing thumbnail to: Vsauce - 20170125 - Destruction - Mind Field (Ep 3) (zD68reVP0Ek).jpg
[download] Resuming download at byte 138622096
[download] Destination: Vsauce - 20170125 - Destruction - Mind Field (Ep 3) (zD68reVP0Ek).313
[download] 100% of 2.90GiB in 20:59
[ffmpeg] Adding metadata to 'Vsauce - 20170125 - Destruction - Mind Field (Ep 3) (zD68reVP0Ek).313'
[download] Destruction - Mind Field (Ep 3) has already been recorded in archive
[download] Destruction - Mind Field (Ep 3) has already been recorded in archive
[download] Downloading video 4 of 8
[youtube] qZXpgf8N6hs: Downloading webpage
[youtube] qZXpgf8N6hs: Downloading video info webpage
[info] qZXpgf8N6hs: downloading video in 3 formats

SebiderSushi commented 4 years ago

In lights of the discussions in #7480, i think the following might be a harmonious solution:

always add file format info to the download archive
implement an option that can be used to activate file format checking when using an archive
leave the current behavior of the --donwload-archive option unchanged since that's probably the most common use-case

SebiderSushi commented 4 years ago

Regarding unreliable formats: Storing the format selection string which was used to select the format at download time could be useful as well. I.E. an archive entry might be formatted as follows [service] [video_id] [format_id] [effective format selection string]

This would support use cases where the format id is instable or irrelevant while also opening up the possibilty to automatically reload better formats once they become available.

Example 1

The effective format selection string might be used in the following use case to ensure that the archive entry stored after loading the video won't block the download of the audio file: youtube-dl -f 'bestvideo,bestaudio' "$url"

Example 2

The user would occasionally run the following command to refresh on their favourite youtube channel:
youtube-dl --download-archive archive [youtube channel url] At some point, youtube recodes some videos of this channel and starts serving them in a better format or youtube-dl improves its "best format" detection. Now, the user could run their usual command and supply it with the new option that enables checking for the format id. That way, every video that has a better format available now can automatically be reloaded.

SebiderSushi commented 4 years ago

An alternative solution to this problem that does not require a change to the downlaod archive format:

youtube-dl could add support for output-template-like parsing in the download archives filename. As long as it allows the user to tell youtube-dl to use a unique archive file for every format_id or format selection string passed via the -f option, the whole thing can be worked around. As a plus, the use of instable format_ids would happen to the users discretion.

This could also open up the possiblity of nice things like the following if wanted by the user: youtube-dl --download-archive "archive_%(channel_id)"

dirkf commented 7 months ago

Pending implementation of https://github.com/yt-dlp/yt-dlp/commit/a13e684813dccc21f3d71711bf79dafbe943bccb, if needed, see #11580.

dirkf commented 7 months ago

The codebases are too different to apply anything like the yt-dlp fix. In the current code, a no-brain change to set a flag for a non-archived multi-format download that disables the in_download_archive() check in _match_entry() solves the problem:

--- old/youtube_dl/YoutubeDL.py
+++ new/youtube_dl/YoutubeDL.py
@@ -797,7 +797,7 @@
                 return 'Skipping %s, because it has exceeded the maximum view count (%d/%d)' % (video_title, view_count, max_views)
         if age_restricted(info_dict.get('age_limit'), self.params.get('age_limit')):
             return 'Skipping "%s" because it is age restricted' % video_title
-        if self.in_download_archive(info_dict):
+        if self.in_download_archive(info_dict) and info_dict.get('__was_in_download_archive') is not False:
             return '%s has already been recorded in archive' % video_title

         if not incomplete:
@@ -1829,9 +1829,14 @@
         if download:
             if len(formats_to_download) > 1:
                 self.to_screen('[info] %s: downloading video in %s formats' % (info_dict['id'], len(formats_to_download)))
+                was_in_download_archive = self.in_download_archive(info_dict)
+            else:
+                was_in_download_archive = None
             for format in formats_to_download:
                 new_info = dict(info_dict)
                 new_info.update(format)
+                if was_in_download_archive is False:
+                    new_info['__was_in_download_archive'] = False
                 self.process_info(new_info)
         # We update the info dict with the best quality format (backwards compatibility)
         info_dict.update(formats_to_download[-1])

The --output-template ... option must be such that different formats of the same id will get different filenames:

$ python -m youtube_dl -f '18,22' 'https://youtu.be/twcbKbLQUuA?si=n_BeNP14zYxMJOY7' -o '%(title)s-%(id)s-%(format_id)s.%(ext)s' --test --download-archive archive
[youtube] twcbKbLQUuA: Downloading webpage
[info] twcbKbLQUuA: downloading video in 2 formats
[download] Destination: Holy Motors Movie CLIP - Merde (2012) - Denis Lavant, Eva Mendes Movie HD-twcbKbLQUuA-18.mp4
[download] 100% of 10.00KiB in 00:00
[download] Destination: Holy Motors Movie CLIP - Merde (2012) - Denis Lavant, Eva Mendes Movie HD-twcbKbLQUuA-22.mp4
[download] 100% of 10.00KiB in 00:00
$

ytdl-org / youtube-dl