yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
77.79k stars 6.1k forks source link

Uncaught exception and no error printed when using --load-info-json and extractor fails #9388

Closed wader closed 4 months ago

wader commented 4 months ago

DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE

Checklist

Provide a description that is worded well enough to be understood

When using --load-info-json and an extractor fails with yt_dlp.utils.ExtractorError it seems to not catch the exception making it not print the "ERROR: " line. Using the same filter and same URL an "ERROR: " line is printed.

Reproduction:

$ python3 -m yt_dlp -J https://www.reddit.com/r/newsbabes/s/92rflI0EB0 > info.json

# throws exception and show stack trace
$ python3 -m yt_dlp -f best --load-info-json info.json
WARNING: "-f best" selects the best pre-merged format which is often not the best option.
         To let yt-dlp download and merge the best available formats, simply do not pass any format selection.
         If you know what you are doing and want only the best pre-merged format, use "-f b" instead to suppress this warning
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/wader/src/yt-dlp/yt_dlp/__main__.py", line 17, in <module>
    yt_dlp.main()
  File "/Users/wader/src/yt-dlp/yt_dlp/__init__.py", line 1030, in main
    _exit(*variadic(_real_main(argv)))
                    ^^^^^^^^^^^^^^^^
  File "/Users/wader/src/yt-dlp/yt_dlp/__init__.py", line 1018, in _real_main
    return ydl.download_with_info_file(expand_path(opts.load_info_filename))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wader/src/yt-dlp/yt_dlp/YoutubeDL.py", line 3570, in download_with_info_file
    self.__download_wrapper(self.process_ie_result)(info, download=True)
  File "/Users/wader/src/yt-dlp/yt_dlp/YoutubeDL.py", line 3531, in wrapper
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wader/src/yt-dlp/yt_dlp/YoutubeDL.py", line 1808, in process_ie_result
    ie_result = self.process_video_result(ie_result, download=download)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wader/src/yt-dlp/yt_dlp/YoutubeDL.py", line 2928, in process_video_result
    raise ExtractorError(
yt_dlp.utils.ExtractorError: [Reddit] x0br599s8ehc1: Requested format is not available. Use --list-formats for a list of available formats

# this print error an no stack trace
$ python3 -m yt_dlp -f best https://www.reddit.com/r/newsbabes/s/92rflI0EB0
WARNING: "-f best" selects the best pre-merged format which is often not the best option.
         To let yt-dlp download and merge the best available formats, simply do not pass any format selection.
         If you know what you are doing and want only the best pre-merged format, use "-f b" instead to suppress this warning
[generic] Extracting URL: https://www.reddit.com/r/newsbabes/s/92rflI0EB0
[generic] 92rflI0EB0: Downloading webpage
[redirect] Following redirect to https://www.reddit.com/r/newsbabes/comments/1am0l3z/ana_mafud_telemundo_arizona/?share_id=qwy0V17xM17tVYObLaChX&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=14&rdt=59051
[Reddit] Extracting URL: https://www.reddit.com/r/newsbabes/comments/1am0l3z/ana_mafud_telemundo_arizona/?share_id=qwy0V17...tm_term=14&rdt=59051
[Reddit] 1am0l3z: Downloading JSON metadata
[Reddit] 1am0l3z: Downloading m3u8 information
[Reddit] 1am0l3z: Downloading MPD manifest
ERROR: [Reddit] x0br599s8ehc1: Requested format is not available. Use --list-formats for a list of available formats

Provide verbose output that clearly demonstrates the problem

Complete Verbose Output

[debug] Command-line config: ['-vU', '-f', 'best', '--load-info-json', 'info.json']
[debug] Encodings: locale UTF-8, fs utf-8, pref UTF-8, out utf-8, error utf-8, screen utf-8
[debug] yt-dlp version stable@2023.12.30 from yt-dlp/yt-dlp [f10589e34] (source)
[debug] Lazy loading extractors is disabled
[debug] Git HEAD: 96f3924ba
[debug] Python 3.11.6 (CPython x86_64 64bit) - macOS-13.6.4-x86_64-i386-64bit (OpenSSL 3.2.0 23 Nov 2023)
[debug] exe versions: ffmpeg 6.1.1 (setts), ffprobe 6.1.1, rtmpdump 2.4
[debug] Optional libraries: brotli-1.1.0, certifi-2023.07.22, mutagen-1.47.0, requests-2.31.0, sqlite3-3.44.2, urllib3-2.0.7
[debug] Proxy map: {}
[debug] Request Handlers: urllib, requests
WARNING: "-f best" selects the best pre-merged format which is often not the best option.
         To let yt-dlp download and merge the best available formats, simply do not pass any format selection.
         If you know what you are doing and want only the best pre-merged format, use "-f b" instead to suppress this warning
[debug] Loaded 1836 extractors
[debug] Fetching release info: https://api.github.com/repos/yt-dlp/yt-dlp/releases/latest
Latest version: stable@2023.12.30 from yt-dlp/yt-dlp
yt-dlp is up to date (stable@2023.12.30 from yt-dlp/yt-dlp)
[debug] Formats sorted by: hasvid, ie_pref, lang, quality, res, fps, hdr:12(7), vcodec:vp9.2(10), channels, acodec, size, br, asr, proto, vext, aext, hasaud, source, id
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/wader/src/yt-dlp/yt_dlp/__main__.py", line 17, in <module>
    yt_dlp.main()
  File "/Users/wader/src/yt-dlp/yt_dlp/__init__.py", line 1030, in main
    _exit(*variadic(_real_main(argv)))
                    ^^^^^^^^^^^^^^^^
  File "/Users/wader/src/yt-dlp/yt_dlp/__init__.py", line 1018, in _real_main
    return ydl.download_with_info_file(expand_path(opts.load_info_filename))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wader/src/yt-dlp/yt_dlp/YoutubeDL.py", line 3570, in download_with_info_file
    self.__download_wrapper(self.process_ie_result)(info, download=True)
  File "/Users/wader/src/yt-dlp/yt_dlp/YoutubeDL.py", line 3531, in wrapper
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wader/src/yt-dlp/yt_dlp/YoutubeDL.py", line 1808, in process_ie_result
    ie_result = self.process_video_result(ie_result, download=download)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/wader/src/yt-dlp/yt_dlp/YoutubeDL.py", line 2928, in process_video_result
    raise ExtractorError(
yt_dlp.utils.ExtractorError: [Reddit] x0br599s8ehc1: Requested format is not available. Use --list-formats for a list of available formats
dirkf commented 4 months ago

So this is what's happening: 1 the extracted formats don't include any combined formats 2 the command using --load-info-json ... asks for a combined format 3 failing to find a match causes ExtractorError to be raised 4 the top-level method download_with_info_file() that processes the info_json file doesn't handle that exception 5 it calls process_ie_result() which, unlike the method extract_info() called in the normal case, does not wrap its core processing with _handle_extraction_exceptions.

If this is the expected behaviour, a solution is just to ask for merged formats as a fallback.

If not, what should happen? In this case there's no point raising ReExtractInfo because the same error will happen after extracting again. However:

wader commented 4 months ago

Good question, not sure. My main concern is that it ends up with an uncaught exception and a stack trace and no clear "ERRROR: ..." gets printed for something that is not an internal error etc. The "ERROR: " line is what goutubedl uses to know if and what went wrong.

bashonly commented 4 months ago

I think this should work:

diff --git a/yt_dlp/YoutubeDL.py b/yt_dlp/YoutubeDL.py
index ef66306b1..b6578ca01 100644
--- a/yt_dlp/YoutubeDL.py
+++ b/yt_dlp/YoutubeDL.py
@@ -3576,6 +3576,8 @@ def download_with_info_file(self, info_filename):
                     raise
                 self.report_warning(f'The info failed to download: {e}; trying with URL {webpage_url}')
                 self.download([webpage_url])
+            except ExtractorError as e:
+                self.report_error(e)
         return self._download_retcode

     @staticmethod
bashonly commented 4 months ago

The bug is worse when you consider that yt-dlp supports JSON array input with multiple info dicts.

write the JSON array:

yt-dlp -j "https://www.youtube.com/playlist?list=OLAK5uy_ltT-RYqjC7ogqlxJs8SmagEbWLEf5TALo" | jq -cs > list.json

  load it back and select a non-existent format without the above patch:

yt-dlp --load-info-json list.json -f nonexistent
ERROR: Do not return 'artist' when 'artists' is present; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
ERROR: Do not return 'creator' when 'creators' is present; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
Traceback (most recent call last):
  File "/home/bashonly/bin/yt-dlp", line 14, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/bashonly/git/yt-dlp/yt_dlp/__init__.py", line 1030, in main
    _exit(*variadic(_real_main(argv)))
                    ^^^^^^^^^^^^^^^^
  File "/home/bashonly/git/yt-dlp/yt_dlp/__init__.py", line 1018, in _real_main
    return ydl.download_with_info_file(expand_path(opts.load_info_filename))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bashonly/git/yt-dlp/yt_dlp/YoutubeDL.py", line 3570, in download_with_info_file
    self.__download_wrapper(self.process_ie_result)(info, download=True)
  File "/home/bashonly/git/yt-dlp/yt_dlp/YoutubeDL.py", line 3531, in wrapper
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/bashonly/git/yt-dlp/yt_dlp/YoutubeDL.py", line 1808, in process_ie_result
    ie_result = self.process_video_result(ie_result, download=download)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bashonly/git/yt-dlp/yt_dlp/YoutubeDL.py", line 2928, in process_video_result
    raise ExtractorError(
yt_dlp.utils.ExtractorError: [youtube] gKgB0flHIkA: Requested format is not available. Use --list-formats for a list of available formats

this results in the uncaught exception and yt-dlp exits during processing of the first info dict.

with the patch, the exception is caught and all info dicts are processed:

yt-dlp --load-info-json list.json -f nonexistent
ERROR: Do not return 'artist' when 'artists' is present; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
ERROR: Do not return 'creator' when 'creators' is present; please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U
ERROR: [youtube] gKgB0flHIkA: Requested format is not available. Use --list-formats for a list of available formats
ERROR: [youtube] y6nJVBMlZUk: Requested format is not available. Use --list-formats for a list of available formats

(EDIT: the deprecation error output is a side effect of 104a7b5a46dc1805157fb4cc11c05876934d37c1 and will be fixed by #9394)

wader commented 4 months ago

Works great, thanks!