ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.11k stars 9.92k forks source link

--write-description fails on Bandcamp #25056

Open emphoeller opened 4 years ago

emphoeller commented 4 years ago

Checklist

Verbose log

$ youtube-dl -v --write-description 'https://virt.bandcamp.com/track/hyper-camelot-guest-director-boss-battle'
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '--write-description', 'https://virt.bandcamp.com/track/hyper-camelot-guest-director-boss-battle']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2020.03.24
[debug] Python version 3.6.5 (CPython) - Linux-5.5.19-pclos1-x86_64-with-mandrake-2020-PCLinuxOS
[debug] exe versions: ffmpeg 4.2.2, ffprobe 4.2.2
[debug] Proxy map: {}
[Bandcamp] hyper-camelot-guest-director-boss-battle: Downloading webpage
[debug] Default format spec: bestvideo+bestaudio/best
WARNING: There's no description to write.
[debug] Invoking downloader on 'https://t4.bcbits.com/stream/e493e1ebbbedc392966bb1bff071780e/mp3-128/804085936?p=0&ts=1588216165&t=2284db76a3a67ef719da6743c43d1a07a2c793a0&token=1588216165_cd699ad85036eea2d3e6f9921d390425419a380e'
[download] Destination: Jake Kaufman - Hyper Camelot (Guest Director Boss Battle)-804085936.mp3
[download] 100% of 2.15MiB in 00:00

Description

Attempting to download a track description from Bandcamp using --write-description results in WARNING: There's no description to write. and no file being written, even though a description is present. As you can see here, the track used above does have a description. Is this maybe not supported at all for Bandcamp yet, making this more of a feature request? (In that case, the current warning is giving the user the wrong idea.)

oxguy3 commented 4 years ago

Was able to replicate this on my machine. It looks like BandcampIE (the extractor for individual tracks on Bandcamp) doesn't include any code for retrieving the description, so this is a missing feature rather than a bug.

Currently, BandcampIE pulls all its metadata out of a JSON object in the page: the trackinfo property of the TralbumData variable. The description of the song can be found in the current property of the TralbumData variable. Actually, the description is made of two parts, current.about and current.lyrics. Given that Bandcamp's website displays these as though they were one continuous description, it seems like it'd be easiest if youtube-dl combined them as well.

Here's some untested code for BandcampIE's _real_extract() method that would pull those two fields out of TralbumData.current, then combine them into one variable:

current = self._parse_json(
    self._search_regex(
        r'current\s*:\s*\[\s*({.+?})\s*\]\s*,\s*?\n',
        webpage, 'current', default='{}'), title)
if current:
    about = str_or_none(current.get('about'))
    lyrics = str_or_none(current.get('lyrics'))
    description = "\n\n".join(filter(None, (about, lyrics)))

If only one of the two values is set, then description will just be set to that value. If they're both set, description will be set to a concatenation of both, with two line breaks added between them.

This code is based on the existing track_info code in BandcampIE, and can be dropped in immediately after that code on line 117. You'd also need to add the description to the function's return value on line 204.

You could also add the same functionality to BandcampAlbumIE with similar code. However, albums don't have lyrics, so you would only need the about field.

emphoeller commented 4 years ago

Here’s what I came up with, based on your code:

tralbumdata_current = self._parse_json(
    self._search_regex(
        r'TralbumData\s*=\s*\{.*?current\s*:\s*(\{.*?\})',
        webpage, 'track description', default='{}'), title)
description = None
if tralbumdata_current:
    description = '\r\n\r\n'.join(filter(None, (
        str_or_none(tralbumdata_current.get('about')),
        str_or_none(tralbumdata_current.get('lyrics')))))

My modifications explained:

I have also added 'description': description, to the return dictionary. However, right now I can’t figure out how to actually execute my modified code, as whenever I call python3 ./youtube-dl/youtube_dl/ --write-description 'https://virt.bandcamp.com/track/hyper-camelot-guest-director-boss-battle', the program executes unchangedly, even when I print something, raise an exception, or produce a syntax error. How do I run my code properly?

Also a side question: What happens when about or lyrics contains }? Wouldn’t that have the regex cut off there, which would lead to an incomplete JSON string with an unterminated string literal being parsed, resulting in an error?