Open Ckankonmange opened 2 years ago
--write-description
just prints the description
from the JSON.
Let's look at the metadata of the items using this jq command
youtube-dl -J $url | jq '{id, title, track, alt_title, creator, artist, uploader, channel, description, album}'
https://www.youtube.com/watch?v=ex_iu63m9wc
{
"id": "ex_iu63m9wc",
"title": "Pyxis",
"track": "Pyxis",
"alt_title": "Pyxis",
"creator": "Home",
"artist": "Home",
"uploader": "Home - Topic",
"channel": "arktheseries",
"description": "Provided to YouTube by DistroKid\n\nPyxis · Home\n\nBefore the Night\n\n℗ Home\n\nReleased on: 2014-12-28\n\nAuto-generated by YouTube.",
"album": "Before the Night"
}
https://www.youtube.com/watch?v=zCflVvMhoAU
{
"id": "zCflVvMhoAU",
"title": "ELUVEITIE - Aidus (OFFICIAL LYRIC VIDEO)",
"track": null,
"alt_title": null,
"creator": null,
"artist": null,
"uploader": "eluveitieofficial",
"channel": "eluveitieofficial",
"description": "Official lyric video for the new single 'Aidus', out now on all platforms: https://eluveitie.bfan.link/aidus.yde \n\nWatch the official music video for 'Aidus': https://youtu.be/7nFc-oS7dv0\n\nWith 'Aidus', ELUVEITIE follow the huge success of their last album 'Ategnatos', thereby showing a never-before-seen side of the band. Listen to the single on your favourite platform: https://eluveitie.bfan.link/aidus.yde \n\nFollow ELUVEITIE:\nInstagram: https://www.instagram.com/eluveitie_official/ \nFacebook: https://www.facebook.com/eluveitie\nTwitter: https://twitter.com/eluveitie \nWebsite: http://www.eluveitie.ch/\nAnd make sure to hit that subscribe button.\n\n\n#eluveitie #aidus #metal #lyrics",
"album": null
}
https://www.youtube.com/watch?v=gEGpPrXUgmM
{
"id": "gEGpPrXUgmM",
"title": "The Rasmus - In the Shadows [HD]",
"track": null,
"alt_title": null,
"creator": null,
"artist": null,
"uploader": "Pablo Marx",
"channel": "Pablo Marx",
"description": "Grupo: The Rasmus\nÁlbum: Dead Letters \nAño: 2004\nGenero: Rock Alternativo",
"album": null
}
What this shows is that the metadata depends on what the uploader decided to include.
The --metadata-from-title
option is available:
<field>
in the regular expression example so where it says (?P.+?)
it means (?P<artist>.+?)
, etc; the Markdown version as rendered in the website is OK (but other <expressions>
aren't affected);-j/-J
output, nor (currently) in the --write-info-json
output, but only when a video is actually downloaded;(?:(?P<artist>.+?)\s*-\s*)?(?P<title>.+)
hoping to cover titles with and without embedded artist names, the effect is (currently) to overwrite an existing valid artist
with NA
when the title doesn't have artist information.yt-dlp has a more extensive metadata parser. However you might do better with a separate music library manager to massage your collection's metadata, perhaps one that matches your content in online databases.
@pukkandan, what about the "currently" items above wrt yt-dlp compat?
It's easy to skip the metadata replacement of an optional named group that's missing from the match without affecting any simpler cases (including all %(field)s
cases) and arguably that would have been right in the first place.
It's also simple to run the MetadataFromTitlePP
for --write-info-json
as long as we don't care about running it twice if a download is being run, but obviously the JSON output would be different. If we do care, it's trickier and might mean moving the function from the PP chain (where it's always first in yt-dl) into the core.
OP's issue (example 3) is caused because our music metadata extraction is currently (partially) broken due to UI changes from YT. See https://github.com/yt-dlp/yt-dlp/issues/4217
PS: Nothing for us to do in example 2
As for your questions:
- the option is implemented in the post-processing phase, so it's not reflected in the
-j/-J
output, nor (currently) in the--write-info-json
output, but only when a video is actually downloaded;It's also simple to run the
MetadataFromTitlePP
for--write-info-json
as long as we don't care about running it twice if a download is being run, but obviously the JSON output would be different. If we do care, it's trickier and might mean moving the function from the PP chain (where it's always first in yt-dl) into the core.
This is an issue with many other options as well. See https://github.com/ytdl-org/youtube-dl/issues/9073 and related issues. The way yt-dlp solves this is by splitting the postprocessing into multiple stages (https://github.com/yt-dlp/yt-dlp/commit/56d868dbb7c72e4fbe9d28d4837cc59261d8fe55)
if you use a regular expression like (?:(?P
.+?)\s-\s)?(?P .+) hoping to cover titles with and without embedded artist names, the effect is (currently) to overwrite an existing valid artist with NA when the title doesn't have artist information.
In this case, why not just %(artist)s - %(title)s
? If the regex doesn't match, both fields are left untouched. But I see how this is an issue for more complex situations. yt-dlp behaves the same way currently
It's easy to skip the metadata replacement of an optional named group that's missing from the match without affecting any simpler cases (including all
%(field)s
cases) and arguably that would have been right in the first place.
I agree this should be the "correct" behavior and wouldn't mind changing this in yt-dlp as well. Backward compat is not a concern in this case imo. Note that "empty match" and "no match" are different and empty matches should be honored
Checklist
Description
I'm trying to get Track, Artist and Album data from youtube, using --write-info-json. So far I've seen 3 outputs:
--write-description also return nothing. Seems like it's autogenerated by Youtube.
gEGpPrXUgmM.description.txt gEGpPrXUgmM.info.json.txt