ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
132.21k stars 10.03k forks source link

[generic extractor] Support Open Graph meta tags #18448

Closed koppa closed 5 years ago

koppa commented 5 years ago

Version: master

I just stumbled on some pages which are annotated the open graph protocol http://ogp.me/.

Example URL pointing to a podcast: https://logbuch-netzpolitik.de/lnp278-nach-art-des-scheiches

and it contains the following meta tags in the head:

<html lang="de-DE" prefix="og: http://ogp.me/ns#">
<meta property="og:type" content="website"/>
<meta property="og:site_name" content="Logbuch:Netzpolitik"/>
<meta property="og:title" content="LNP278 Nach Art des Scheiches"/>
<meta property="og:url" content="https://logbuch-netzpolitik.de/lnp278-nach-art-des-scheiches"/>
<meta property="og:description" content="Brexit &#x2014; CDU &#x2014; 5G-Vergabe &#x2014; Digitalpakt &#x2014; Robert Tibbo &#x2014; NSO und Kashoggi &#x2014; Facebook-Krise &#x2014; China &#x2014; Artikel 13 &#x2014; Vosshoff &#x2014; Marriot-Hack &#x2014; StreamOn &#x2014; Termine&#10;Nach einer terminbedingten kurzen Pause nehmen wir wieder das Heft auf und tingeln durch die sich aufstauenden netzpolitischen Themen der letzten zwei Wochen, finden endlich eine sinnvolle Anwendung f&#xFC;r die Blockchain und erkl&#xE4;ren, was der Brexit mit Per Anhalter durch die Galaxis zu tun hat."/>
<meta property="og:image" content="https://meta.metaebene.me/media/lnp/lnp-logo-600x600.jpg"/>
<meta property="og:audio" content="https://logbuch-netzpolitik.de/podlove/file/6455/s/opengraph/c/episode/lnp278-nach-art-des-scheiches.m4a"/>
<meta property="og:audio:type" content="audio/mp4"/>
<meta property="og:audio" content="https://logbuch-netzpolitik.de/podlove/file/6453/s/opengraph/c/episode/lnp278-nach-art-des-scheiches.mp3"/>
<meta property="og:audio:type" content="audio/mpeg"/>
<meta property="og:audio" content="https://logbuch-netzpolitik.de/podlove/file/6454/s/opengraph/c/episode/lnp278-nach-art-des-scheiches.opus"/>
<meta property="og:audio:type" content="audio/opus"/>
<meta property="og:audio" content="https://logbuch-netzpolitik.de/podlove/file/6456/s/opengraph/c/episode/lnp278-nach-art-des-scheiches.oga"/>
<meta property="og:audio:type" content="audio/ogg"/>

The extraction of the formats and the media files should be trivial. Are you interested in a pull request implementing this feature?

A short search in the source code already using these meta tags in various places, but the curent support is incomplete and incorrect (at least for general cases).

koppa commented 5 years ago

I found that support for Open Graph is already there, only the og:audio is missing.. I close this issue for now and will open a pull request in the future.