ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.74k stars 10.07k forks source link

When importing and using youtube-dl in python code, if I pass EmbedThumbnail to postprocessors before FFmpegMetadata, it fails to set the thumbnail. #30101

Closed ngmtine closed 3 years ago

ngmtine commented 3 years ago

Checklist

Description

When passing FFmpegMetadata and EmbedThumbnail to postprocessors, thumbnail setting fails depending on the order.

I ran the script sequentially in the debugger, and it seems that even the code that doesn't work sets the thumbnail at least once. (After that, the information is overwritten in the postprocessor of FFmpegMetadata?)

If this is a spec and not a bug, please let me know.

thank you.

↓Correct code (thumbnail will be set)

from youtube_dl import YoutubeDL

def download_m4a(url: str) -> bool:
    opts = {
        "quiet": False,
        "format": "bestaudio[ext=m4a]",
        "writethumbnail": True,
        'postprocessors': [
            {
            'key': 'FFmpegMetadata',
            },
            {
            'key': 'EmbedThumbnail',
            }, 
        ],
        "verbose": True
        }
    try:
        with YoutubeDL(opts) as ydl:
            result = ydl.download([url])
    except Exception as e:
        print(e)
        return False
    return True

if __name__ == "__main__":
    download_m4a("https://www.youtube.com/watch?v=BXB26PzV31k")

↓Stuck code (thumbnail not set)

from youtube_dl import YoutubeDL

def download_m4a(url: str) -> bool:
    opts = {
        "quiet": False,
        "format": "bestaudio[ext=m4a]",
        "writethumbnail": True,
        'postprocessors': [
            {
            'key': 'EmbedThumbnail',
            }, 
            {
            'key': 'FFmpegMetadata',
            },
        ],
        "verbose": True
        }
    try:
        with YoutubeDL(opts) as ydl:
            result = ydl.download([url])
    except Exception as e:
        print(e)
        return False
    return True

if __name__ == "__main__":
    download_m4a("https://www.youtube.com/watch?v=BXB26PzV31k")

Verbose log

Output from a stuck code.

[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.06.06
[debug] Git HEAD: ad461ee
[debug] Python version 3.8.2 (CPython) - Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.29
[debug] exe versions: ffmpeg 4.2.4, ffprobe 4.2.4
[debug] Proxy map: {}
[youtube] BXB26PzV31k: Downloading webpage
[youtube] BXB26PzV31k: Downloading thumbnail ...
[youtube] BXB26PzV31k: Writing thumbnail to: [MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.webp
[debug] Invoking downloader on 'https://r1---sn-oguelnlz.googlevideo.com/videoplayback?expire=1634222236&ei=POxnYb7nJpiClQSgwZuwBA&ip=124.18.74.222&id=o-AEU1e-dKgLuoSzggz6VGEu6kmCRzapXAL9kZIsC_PHm_&itag=140&source=youtube&requiressl=yes&mh=XK&mm=31%2C29&mn=sn-oguelnlz%2Csn-oguesnz6&ms=au%2Crdu&mv=m&mvi=1&pl=18&initcwndbps=1401250&vprv=1&mime=audio%2Fmp4&ns=Bg-DgBjYYBAL7H89q2QO9XAG&gir=yes&clen=4671357&dur=288.600&lmt=1633286161622181&mt=1634200131&fvip=1&keepalive=yes&fexp=24001373%2C24007246&c=WEB&txp=5531432&n=pRLxy8uibzB9Lx&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cvprv%2Cmime%2Cns%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRQIhAM-fwl8_xXdVvI4tmprXmi0z9sdeMnAUaHquNLw2ZWtqAiAujEyO0GFxwE-H1ienavJb3QOaLWY1Ae0Yi740SoIpAA%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRQIgD_uZ2s4qyxxSK5KX4no3FRnUB60remE0HTm1_40EyNkCIQDEUz_UWPSwLR3LDC2zBGh-gT5e_gEYg7mbwuA1_hvSOQ%3D%3D'
[download] Destination: [MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.m4a
[download] 100% of 4.45MiB in 01:09
[ffmpeg] Correcting container in "[MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.m4a"
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:[MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.m4a' -c copy -f mp4 'file:[MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.temp.m4a'
[ffmpeg] Converting thumbnail "[MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.webp" to JPEG
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:[MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.webp' -bsf:v mjpeg2jpeg 'file:[MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.jpg'
[atomicparsley] Adding thumbnail to "[MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.m4a"
[debug] AtomicParsley command line: AtomicParsley '[MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.m4a' --artwork '[MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.jpg' -o '[MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.temp.m4a'
[ffmpeg] Adding metadata to '[MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.m4a'
[debug] ffmpeg command line: ffmpeg -y -loglevel repeat+info -i 'file:[MV] end of a life - Calliope Mori (Original Song)-BXB26PzV31k.m4a' -vn -acodec copy -metadata 'title=[MV] end of a life - Calliope Mori (Original Song)' -metadata date=20210930 -metadata 'description=Streaming: https://cover.lnk.to/endofalife
dirkf commented 3 years ago

Presumably this doesn't happen with files that have both audio and video?

yt-dl thinks that ffmpeg should skip copying video streams when the extension is m4a (postprocessor/ffmpeg.py l.484 ff.), but the cover art is stored as a video stream:

    Stream #0:1: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 1280x720, 90k tbr, 90k tbn, 90k tbc (attached pic)

I guess there may also be an issue with extracting audio where a previously embedded thumbnail is discarded.

A fix could involve extending the FFmpegPostProcessor.get_audio_codec() method so that (perhaps with a additional parameter set to a non-default value) it returns a list of audio_codec, metadata codec, ...

Meanwhile, the solution is to make sure that the thumbnail is set after setting other metadata, as in your first test program, and that may equally apply to the order of command-line options for the yt-dl program itself.

pukkandan commented 3 years ago

The ffmpeg postprocessors have specific order requirements. I recommend always putting them in the same order as used by the CLI (See __init__.py). While it may be possible to detect thumbnails and ensure they are preserved, I dont think such an improvement is warranted since (1) it is non-trivial to fix the interactions between all the PPs and (2) it is easily fixable by calling the PPs in the specific order

dirkf commented 3 years ago

That seems entirely reasonable, and of course the processing in __init__.py ensures that such an issue doesn't affect the CLI.

ngmtine commented 3 years ago

Thank you for all your comments. I now understand that this behavior is a specification. I will take care of the order of the post processors as I did in my original code.