yt-dlp / yt-dlp

A feature-rich command-line audio/video downloader
https://discord.gg/H5MNcFW63r
The Unlicense
86.84k stars 6.77k forks source link

video + subtitles + sponsorblock #2753

Closed phyzical closed 2 years ago

phyzical commented 2 years ago

Checklist

Description

Hey,

i think i'm running into an issue on conversion + rewriting of the subtitles file when there is sponsorblock segments to be removed

url in question: https://www.youtube.com/watch?v=64cEmjtwRgw

my container https://hub.docker.com/r/phyzical/yt-dlp i run the scrapes from (if it helps)

  url=${urls[$channelName]}
        format="bv*[ext=mp4]+ba[ext=m4a]"
        outputFormat="$channelName/processing/%(upload_date)s.%(title)s.%(ext)s"
        oneMonthAgo="$(date -d "-1 months" '+%Y%m%d')"
        showPath="$youtubePath/$channelName"
        processingPath="$showPath/processing"

 -vU -f "$format" --download-archive "$channelName.txt" --write-thumbnail --add-metadata \
        --no-write-playlist-metafiles --compat-options no-youtube-unavailable-videos --sponsorblock-remove "default" \
        --write-auto-sub --cookies cookies.txt --write-info-json --convert-subs=srt --sub-lang "en" \
        --match-filter "availability = 'public'" --datebefore $oneMonthAgo --merge-output-format mp4 -o "$outputFormat" "$url"

( i thiink i'm running latest i last updated everything 7 days ago)

Verbose log

Sun Feb 13 17:27:33 AWST 2022 [download] Downloading video 1 of 741
[debug] [youtube] Extracting URL: https://www.youtube.com/watch?v=64cEmjtwRgw
Sun Feb 13 17:27:33 AWST 2022 [youtube] 64cEmjtwRgw: Downloading webpage
[debug] [youtube] Extracted SAPISID cookie
Sun Feb 13 17:27:34 AWST 2022 [youtube] 64cEmjtwRgw: Downloading android player API JSON
[debug] Sort order given by extractor: quality, res, fps, hdr:12, source, codec:vp9.2, lang, proto
[debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), acodec, lang, proto, filesize, fs_approx, tbr, vbr, abr, asr, vext, aext, hasaud, id
[debug] Downloading subtitles: en
[debug] SponsorBlock query: https://sponsor.ajay.app/api/skipSegments/6f4c?service=YouTube&categories=%5B%22music_offtopic%22%2C+%22interaction%22%2C+%22intro%22%2C+%22outro%22%2C+%22sponsor%22%2C+%22preview%22%2C+%22selfpromo%22%2C+%22filler%22%5D
Sun Feb 13 17:27:34 AWST 2022 [SponsorBlock] Fetching SponsorBlock segments
Sun Feb 13 17:27:35 AWST 2022 [SponsorBlock] Found 1 segments in the SponsorBlock database
Sun Feb 13 17:27:35 AWST 2022 [info] 64cEmjtwRgw: Downloading 1 format(s): 137+140
[debug] Invoking downloader on "https://www.youtube.com/api/timedtext?v=64cEmjtwRgw&asr_langs=de%2Cen%2Ces%2Cfr%2Cid%2Cit%2Cja%2Cko%2Cnl%2Cpt%2Cru%2Ctr%2Cvi&caps=asr&exp=xctw&xoaf=5&hl=en&ip=0.0.0.0&ipbits=0&expire=1644769653&sparams=ip%2Cipbits%2Cexpire%2Cv%2Casr_langs%2Ccaps%2Cexp%2Cxoaf&signature=6083B6B829412958FDBF526DD70FD2F278E53492.365E004BF226EFFF7121DF3B86DAB95E35D801B0&key=yt8&kind=asr&lang=en&tlang=en&fmt=vtt"
Sun Feb 13 17:27:35 AWST 2022 [info] Writing video subtitles to: CodysLab/processing/20220109.Cody's Algae Panel.en.vtt
Sun Feb 13 17:27:35 AWST 2022 [download] Destination: CodysLab/processing/20220109.Cody's Algae Panel.en.vtt
Sun Feb 13 17:27:36 AWST 2022 [download] 1.00KiB at Unknown speed (00:00)[download] 3.00KiB at Unknown speed (00:00)[download] 7.00KiB at  2.17MiB/s (00:00)   [download] 15.00KiB at  1.06MiB/s (00:00)[download] 31.00KiB at 507.39KiB/s (00:00)[download] 63.00KiB at 622.40KiB/s (00:00)[download] 127.00KiB at 843.11KiB/s (00:00)[download] 140.83KiB at 898.11KiB/s (00:00)[download] 100% of 140.83KiB in 00:00      
Sun Feb 13 17:27:36 AWST 2022 Deleting existing file CodysLab/processing/20220109.Cody's Algae Panel.webp
Sun Feb 13 17:27:36 AWST 2022 [info] Downloading video thumbnail 41 ...
WARNING: Unable to download video thumbnail 41: HTTP Error 404: Not Found
Sun Feb 13 17:27:36 AWST 2022 [info] Downloading video thumbnail 40 ...
WARNING: Unable to download video thumbnail 40: HTTP Error 404: Not Found
Sun Feb 13 17:27:37 AWST 2022 [info] Downloading video thumbnail 39 ...
WARNING: Unable to download video thumbnail 39: HTTP Error 404: Not Found
Sun Feb 13 17:27:37 AWST 2022 [info] Downloading video thumbnail 38 ...
WARNING: Unable to download video thumbnail 38: HTTP Error 404: Not Found
Sun Feb 13 17:27:37 AWST 2022 [info] Downloading video thumbnail 37 ...
Sun Feb 13 17:27:38 AWST 2022 [info] Writing video thumbnail 37 to: CodysLab/processing/20220109.Cody's Algae Panel.webp
Sun Feb 13 17:27:38 AWST 2022 [info] Writing video metadata as JSON to: CodysLab/processing/20220109.Cody's Algae Panel.info.json
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:CodysLab/processing/20220109.Cody'"'"'s Algae Panel.en.vtt' -f srt -movflags +faststart 'file:CodysLab/processing/20220109.Cody'"'"'s Algae Panel.en.srt'
Sun Feb 13 17:27:38 AWST 2022 [SubtitlesConvertor] Converting subtitles
Sun Feb 13 17:27:38 AWST 2022 Deleting original file CodysLab/processing/20220109.Cody's Algae Panel.en.vtt (pass -k to keep)
Sun Feb 13 17:27:38 AWST 2022 [download] CodysLab/processing/20220109.Cody's Algae Panel.mp4 has already been downloaded
[debug] ffprobe command line: ffprobe -hide_banner -show_format -show_streams -print_format json 'file:CodysLab/processing/20220109.Cody'"'"'s Algae Panel.mp4'
[debug] Concat spec = 14.251000-inf
[debug] Writing concat spec to CodysLab/processing/20220109.Cody's Algae Panel.temp.mp4.concat
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -hide_banner -nostdin -f concat -safe 0 -i 'file:CodysLab/processing/20220109.Cody'"'"'s Algae Panel.temp.mp4.concat' -map 0 -dn -ignore_unknown -c copy -c:s mov_text -movflags +faststart 'file:CodysLab/processing/20220109.Cody'"'"'s Algae Panel.temp.mp4'
Sun Feb 13 17:27:38 AWST 2022 [ModifyChapters] Removing chapters from CodysLab/processing/20220109.Cody's Algae Panel.mp4
[debug] Writing concat spec to CodysLab/processing/20220109.Cody's Algae Panel.en.temp.srt.concat
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -hide_banner -nostdin -f concat -safe 0 -i 'file:CodysLab/processing/20220109.Cody'"'"'s Algae Panel.en.temp.srt.concat' -map 0 -dn -ignore_unknown -c copy -movflags +faststart 'file:CodysLab/processing/20220109.Cody'"'"'s Algae Panel.en.temp.srt'
Sun Feb 13 17:27:41 AWST 2022 [ModifyChapters] Removing chapters from CodysLab/processing/20220109.Cody's Algae Panel.en.srt
[debug] file:CodysLab/processing/20220109.Cody's Algae Panel.en.temp.srt.concat: Result not representable
ERROR: Postprocessing: file:CodysLab/processing/20220109.Cody's Algae Panel.en.temp.srt.concat: Result not representable
Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 3159, in process_info
    replace_info_dict(self.post_process(dl_filename, info_dict, files_to_move))
  File "/usr/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 3336, in post_process
    info = self.run_all_pps('post_process', info, additional_pps=info.get('__postprocessors'))
  File "/usr/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 3323, in run_all_pps
    info = self.run_pp(pp, info)
  File "/usr/lib/python3.9/site-packages/yt_dlp/YoutubeDL.py", line 3296, in run_pp
    files_to_delete, infodict = pp.run(infodict)
  File "/usr/lib/python3.9/site-packages/yt_dlp/postprocessor/common.py", line 22, in run
    ret = func(self, info, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/yt_dlp/postprocessor/common.py", line 117, in wrapper
    return func(self, info)
  File "/usr/lib/python3.9/site-packages/yt_dlp/postprocessor/modify_chapters.py", line 66, in run
    in_out_files.extend(remove_chapters(in_file, True) for in_file in self._get_supported_subs(info))
  File "/usr/lib/python3.9/site-packages/yt_dlp/postprocessor/modify_chapters.py", line 66, in <genexpr>
    in_out_files.extend(remove_chapters(in_file, True) for in_file in self._get_supported_subs(info))
  File "/usr/lib/python3.9/site-packages/yt_dlp/postprocessor/modify_chapters.py", line 63, in remove_chapters
    return file, self.remove_chapters(file, cuts, concat_opts, self._force_keyframes and not is_sub)
  File "/usr/lib/python3.9/site-packages/yt_dlp/postprocessor/modify_chapters.py", line 321, in remove_chapters
    self.concat_files([in_file] * len(concat_opts), out_file, concat_opts)
  File "/usr/lib/python3.9/site-packages/yt_dlp/postprocessor/ffmpeg.py", line 388, in concat_files
    self.real_run_ffmpeg(
  File "/usr/lib/python3.9/site-packages/yt_dlp/postprocessor/ffmpeg.py", line 333, in real_run_ffmpeg
    raise FFmpegPostProcessorError(stderr.split('\n')[-1])
yt_dlp.postprocessor.ffmpeg.FFmpegPostProcessorError: file:CodysLab/processing/20220109.Cody's Algae Panel.en.temp.srt.concat: Result not representable
phyzical commented 2 years ago

@nihil-admirari (not sure if your still contributing but i know you were instrumental in getting the sponsorblock reworks out the door)

nihil-admirari commented 2 years ago

This is the SRT:

1
00:00:16,720 --> 00:00:19,109

all right everyone cody here welcome

...

Note that it starts at ~16 seconds.

This is the concat spec:

ffconcat version 1.0
file 'file:processing/20220109.Cody'\''s Algae Panel.en.srt'
inpoint 14.251000

It cuts everything before ~14 seconds, but there is nothing in SRT before 16 seconds, so FFmpeg throws an error. If I manually change 14 -> 17 in concat spec, FFmpeg succeeds.

Expected behaviour in this case is to simply subtract 14.251 from all timestamps in SRT file. Would you please file a bug with FFmpeg?

phyzical commented 2 years ago

thanks for that speedy reply @nihil-admirari

happy to, will open one later today.

in the meantime i think last time i remember this occurred with the fact i use an alpine based dockerfile, but we got around it by just using a different format, (though that was related to prepossessing the video not subtitles)

is there anything you can think of that i can do as a workaround until they reply/patch. (maybe i can try using a another subtitle format? though from memory the settings i have was the best chance case of getting subtititles)

or can you replicate this without the alpine distro?

all good if not :)

nihil-admirari commented 2 years ago

last time i remember this occurred with the fact i use an alpine based dockerfile, but we got around it by just using a different format

Not really. You've encountered an FFmpeg issue, a patch for which was only recently accepted. You were unable to run a patched version, since it required glibc, which Alpine lacks.

or can you replicate this without the alpine distro?

Yes, it can be replicated without Alpine, and even without yt-dlp.

maybe i can try using a another subtitle format?

I tested all supported formats: VTT, ASS, SRT, LRC. None work. On the other hand, using -ss 14.251 instead of concat format generates a warning:

Cody's Algae Panel [64cEmjtwRgw].en.srt: could not seek to position 14.251

but does what it's supposed to do, i.e. just subtracts 14.251 from all timestamps.

is there anything you can think of that i can do as a workaround until they reply/patch

You can download and cut the video without subtitles, then cut the subtitles manually:

ffmpeg -ss 14.251 -i Cody\'s\ Algae\ Panel\ \[64cEmjtwRgw\].en.srt -c copy _.srt

and then manually merge subtitles with the video:

ffmpeg -i "Cody's Algae Panel [64cEmjtwRgw].mp4" -i _.srt -c copy _.mp4
pukkandan commented 2 years ago

When you open an issue on ffmpeg, please post link here for reference

phyzical commented 2 years ago

@nihil-admirari thanks for taking the time to test everything and all those suggestions/workarounds

just filling out the issue now 👍

phyzical commented 2 years ago

https://trac.ffmpeg.org/ticket/9646#ticket, let me know if i didnt include a good enough description and i will update accordingly

nihil-admirari commented 2 years ago

I'm afraid they're not gonna be satisfied. They have a list of requirements for bug reports: https://ffmpeg.org/bugreports.html.

  1. Must be reproducible on master, not on release branches. Latest builds by BtbN are OK, Alpine's FFmpeg is not.
  2. Very verbose logging is required: ffmpeg -v 9 -loglevel 99, i.e. FFmpeg must be invoked with

    ffmpeg -v 9 -loglevel 99 -f concat -safe 0 -i "20220109.Cody's Algae Panel.en.temp.srt.concat" -c copy temp.srt

    (other options are superfluous).

  3. Both SRT file and concat spec must be attached.
  4. You can copy my analysis of the problem there.
  5. Mentioning that the issue is reproducible with other subtitle types is a plus.
  6. Mentioning that -ss issues a warning but does The Right Thing is a plus:

    ffmpeg -v 9 -loglevel 99 -ss 14.251 -i "20220109.Cody's Algae Panel.en.srt" -c copy temp.srt
phyzical commented 2 years ago

👍 will do this arvo

phyzical commented 2 years ago

@nihil-admirari hmm any idea how i can get the "20220109.Cody's Algae Panel.en.temp.srt.concat" file? it seems all i have left is the original srt the updated video and the original video? (after running yt-dlp on the video in question)

updated the ticket i think given what you gave is hopefully enough to show what it should be doing.

phyzical commented 2 years ago

just had a thought.. while i agree it should be the responsibility of ffmpeg to fix the issue we could add logic to just enforce like 0.000001 seconds i.e never allow it to strip right up to the start? i assume the same towards the end of the video?

maybe as a flag (incase others are stuck on ffmpeg builds that could never get the fix)?

(oh and can confirm the workaround works thanks again!)

nihil-admirari commented 2 years ago

any idea how i can get the "20220109.Cody's Algae Panel.en.temp.srt.concat" file?

Just save

ffconcat version 1.0
file 'file:20220109.Cody'\''s Algae Panel.en.srt'
inpoint 14.251000

in a file. In the future, https://github.com/yt-dlp/yt-dlp/pull/2793/commits/e068871e2c0c0c6efceecb3b85868fee87d61103 is going to preserve concat spec on --keep-video.

updated the ticket i think given what you gave is hopefully enough to show what it should be doing.

You've added the logs for -ss, which is good, but the primary offender is still there without a clean command line and with no logs at all.

ffmpeg -y -loglevel repeat+info -hide_banner -nostdin -f concat -safe 0 -i 'file:CodysLab/processing/20220109.Cody'"'"'s Algae Panel.en.temp.srt.concat' -map 0 -dn -ignore_unknown -c copy -movflags +faststart 'file:CodysLab/processing/20220109.Cody'"'"'s Algae Panel.en.temp.srt'

should be replaced with

ffmpeg -v 9 -loglevel 99 -f concat -safe 0 -i "20220109.Cody's Algae Panel.en.temp.srt.concat" -c copy temp.srt

and the logs of the above command should be added.

then we can manually do it by running

FFmpeg team does not need a workaround.

i can upload the files in question but they total over 1gb

You only need to attach SRT and a concat spec. They won't exceed a 1 MB.

we could add logic to just enforce like 0.000001 seconds i.e never allow it to strip right up to the start?

We cannot. The problem is not about cutting starting from zero, it's about cutting ending before the first subtitle appears. In this case, the difference between the end of the cut and the first subtitle is ~2 seconds, not 0.000001 seconds. If you cut too much, subtitles will get out of sync with the video.

phyzical commented 2 years ago

okay i think it should be up to spec now thanks for the hand holding.

but i noticed i got 20220109.Cody's Algae Panel.en.temp.srt.concat: Result too large or is this just due to increased verbosity?

We cannot. The problem is not about cutting starting from zero, it's about cutting ending before the first subtitle appears. In this case, the difference between the end of the cut and the first subtitle is ~2 seconds, not 0.000001 seconds. If you cut too much, subtitles will get out of sync with the video.

no worries makes sense 👍

nihil-admirari commented 2 years ago

okay i think it should be up to spec now thanks for the hand holding.

Almost done.

Please note: once the download is complete, the file will be deleted from our servers.

This is not OK. If you look at a neighbouring issue https://trac.ffmpeg.org/ticket/9167, you'll see that FFmpeg issue tracker has the ability to attach files. Please use it.

Result too large

I'm getting “Numerical result out of range.” Most likely it's a ERANGE errno, which has different textual representations on different systems: https://www.gnu.org/software/libc/manual/html_node/Error-Codes.html.

phyzical commented 2 years ago

This is not OK. If you look at a neighbouring issue https://trac.ffmpeg.org/ticket/9167, you'll see that FFmpeg issue tracker has the ability to attach files. Please use it.

ugh sorry completely overlooked the add attachment

thanks again :)

pukkandan commented 2 years ago

Closing this since there doesn't seem to be any workaround we can implement for this.

If anyone has a patch for the ffmpeg issue (and upstream doesn't accept it), please let us know so that we can apply it to https://github.com/yt-dlp/Ffmpeg-Builds

lamyergeier commented 8 months ago

@pukkandan I get Numerical result out of range, when there is a sponsorblock segment present and when I am trying to merge video, subtitle, thumbnail into one. What should I do?

Can we automatically skip merging the subtitle when this happens?