ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.38k stars 10.04k forks source link

[Youtube] Add Music Autogenerated data in --write-info-json #31118

Open Ckankonmange opened 2 years ago

Ckankonmange commented 2 years ago

Checklist

Description

I'm trying to get Track, Artist and Album data from youtube, using --write-info-json. So far I've seen 3 outputs:

--write-description also return nothing. Seems like it's autogenerated by Youtube.

gEGpPrXUgmM.description.txt gEGpPrXUgmM.info.json.txt

image

dirkf commented 2 years ago

--write-description just prints the description from the JSON.

Let's look at the metadata of the items using this jq command

youtube-dl -J $url | jq '{id, title, track, alt_title, creator, artist, uploader, channel, description, album}'

https://www.youtube.com/watch?v=ex_iu63m9wc

{
  "id": "ex_iu63m9wc",
  "title": "Pyxis",
  "track": "Pyxis",
  "alt_title": "Pyxis",
  "creator": "Home",
  "artist": "Home",
  "uploader": "Home - Topic",
  "channel": "arktheseries",
  "description": "Provided to YouTube by DistroKid\n\nPyxis · Home\n\nBefore the Night\n\n℗ Home\n\nReleased on: 2014-12-28\n\nAuto-generated by YouTube.",
  "album": "Before the Night"
}

https://www.youtube.com/watch?v=zCflVvMhoAU

{
  "id": "zCflVvMhoAU",
  "title": "ELUVEITIE - Aidus (OFFICIAL LYRIC VIDEO)",
  "track": null,
  "alt_title": null,
  "creator": null,
  "artist": null,
  "uploader": "eluveitieofficial",
  "channel": "eluveitieofficial",
  "description": "Official lyric video for the new single 'Aidus', out now on all platforms: https://eluveitie.bfan.link/aidus.yde \n\nWatch the official music video for 'Aidus': https://youtu.be/7nFc-oS7dv0\n\nWith 'Aidus', ELUVEITIE follow the huge success of their last album 'Ategnatos', thereby showing a never-before-seen side of the band. Listen to the single on your favourite platform: https://eluveitie.bfan.link/aidus.yde \n\nFollow ELUVEITIE:\nInstagram: https://www.instagram.com/eluveitie_official/ \nFacebook: https://www.facebook.com/eluveitie\nTwitter: https://twitter.com/eluveitie \nWebsite: http://www.eluveitie.ch/\nAnd make sure to hit that subscribe button.\n\n\n#eluveitie #aidus #metal #lyrics",
  "album": null
}

https://www.youtube.com/watch?v=gEGpPrXUgmM

{
  "id": "gEGpPrXUgmM",
  "title": "The Rasmus - In the Shadows [HD]",
  "track": null,
  "alt_title": null,
  "creator": null,
  "artist": null,
  "uploader": "Pablo Marx",
  "channel": "Pablo Marx",
  "description": "Grupo: The Rasmus\nÁlbum: Dead Letters \nAño: 2004\nGenero: Rock Alternativo",
  "album": null
}

What this shows is that the metadata depends on what the uploader decided to include.

The --metadata-from-title option is available:

yt-dlp has a more extensive metadata parser. However you might do better with a separate music library manager to massage your collection's metadata, perhaps one that matches your content in online databases.

dirkf commented 2 years ago

@pukkandan, what about the "currently" items above wrt yt-dlp compat?

It's easy to skip the metadata replacement of an optional named group that's missing from the match without affecting any simpler cases (including all %(field)s cases) and arguably that would have been right in the first place.

It's also simple to run the MetadataFromTitlePP for --write-info-json as long as we don't care about running it twice if a download is being run, but obviously the JSON output would be different. If we do care, it's trickier and might mean moving the function from the PP chain (where it's always first in yt-dl) into the core.

pukkandan commented 2 years ago

OP's issue (example 3) is caused because our music metadata extraction is currently (partially) broken due to UI changes from YT. See https://github.com/yt-dlp/yt-dlp/issues/4217

PS: Nothing for us to do in example 2


As for your questions:

  • the option is implemented in the post-processing phase, so it's not reflected in the -j/-J output, nor (currently) in the --write-info-json output, but only when a video is actually downloaded;

It's also simple to run the MetadataFromTitlePP for --write-info-json as long as we don't care about running it twice if a download is being run, but obviously the JSON output would be different. If we do care, it's trickier and might mean moving the function from the PP chain (where it's always first in yt-dl) into the core.

This is an issue with many other options as well. See https://github.com/ytdl-org/youtube-dl/issues/9073 and related issues. The way yt-dlp solves this is by splitting the postprocessing into multiple stages (https://github.com/yt-dlp/yt-dlp/commit/56d868dbb7c72e4fbe9d28d4837cc59261d8fe55)

if you use a regular expression like (?:(?P.+?)\s-\s)?(?P.+) hoping to cover titles with and without embedded artist names, the effect is (currently) to overwrite an existing valid artist with NA when the title doesn't have artist information.</p> </blockquote> <p>In this case, why not just <code>%(artist)s - %(title)s</code>? If the regex doesn't match, both fields are left untouched. But I see how this is an issue for more complex situations. yt-dlp behaves the same way currently</p> <blockquote> <p>It's easy to skip the metadata replacement of an optional named group that's missing from the match without affecting any simpler cases (including all <code>%(field)s</code> cases) and arguably that would have been right in the first place.</p> </blockquote> <p>I agree this should be the "correct" behavior and wouldn't mind changing this in yt-dlp as well. Backward compat is not a concern in this case imo. Note that "empty match" and "no match" are different and empty matches should be honored</p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>