ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.39k stars 9.96k forks source link

[Tumblr] Some broken Instagram embeds #9213

Closed Hrxn closed 8 years ago

Hrxn commented 8 years ago

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2016.04.13. If it's not read this FAQ entry and update. Issues with outdated version will be rejected.



This is related to #8817

The example URL from #8817

http://vitasidorkina.tumblr.com/post/134652425014/joskriver-victoriassecret-invisibility-or

C:\Users\Hrxn>youtube-dl --ignore-config --verbose "http://vitasidorkina.tumblr.com/post/134652425014/joskriver-victoriassecret-invisibility-or"
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--ignore-config', u'--verbose', u'http://vitasidorkina.tumblr.com/post/134652425014/joskriver-victoriassecret-invisibility-or']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2016.04.13
[debug] Python version 2.7.10 - Windows-8-6.2.9200
[debug] exe versions: ffmpeg N-79253-g8005b6d
[debug] Proxy map: {}
[Tumblr] 134652425014: Downloading webpage
[generic] joskriver-victoriassecret-invisibility-or#_=_: Requesting header
WARNING: Falling back on generic information extractor.
[generic] joskriver-victoriassecret-invisibility-or#_=_: Downloading webpage
[generic] joskriver-victoriassecret-invisibility-or#_=_: Extracting information
[Instagram] -7LnUPGlSo: Downloading webpage
[debug] Invoking downloader on u'http://scontent-fra3-1.cdninstagram.com/t50.2886-16/12327516_1661602684127643_1808426200_n.mp4'
[download] Destination: Video by victoriassecret--7LnUPGlSo.mp4
[download] 100% of 1.46MiB in 00:00

C:\Users\Hrxn>

Working as before..

Another URL from that site (also Instagram embed, Post type video etc.)

http://vitasidorkina.tumblr.com/post/132533543004/vitasidorkinalove-counting-down-the-days-till

C:\Users\Hrxn>youtube-dl --ignore-config --verbose "http://vitasidorkina.tumblr.com/post/132533543004/vitasidorkinalove-counting-down-the-days-till"
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'--ignore-config', u'--verbose', u'http://vitasidorkina.tumblr.com/post/132533543004/vitasidorkinalove-counting-down-the-days-till']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2016.04.13
[debug] Python version 2.7.10 - Windows-8-6.2.9200
[debug] exe versions: ffmpeg N-79253-g8005b6d
[debug] Proxy map: {}
[Tumblr] 132533543004: Downloading webpage
[generic] vitasidorkinalove-counting-down-the-days-till#_=_: Requesting header
WARNING: Falling back on generic information extractor.
[generic] vitasidorkinalove-counting-down-the-days-till#_=_: Downloading webpage
[generic] vitasidorkinalove-counting-down-the-days-till#_=_: Extracting information
ERROR: Unsupported URL: http://vitasidorkina.tumblr.com/post/132533543004/vitasidorkinalove-counting-down-the-days-till#_=_
Traceback (most recent call last):
  File "youtube_dl\extractor\generic.pyo", line 1360, in _real_extract
  File "youtube_dl\compat.pyo", line 279, in compat_etree_fromstring
  File "youtube_dl\compat.pyo", line 268, in _XML
  File "xml\etree\ElementTree.pyo", line 1642, in feed
  File "xml\etree\ElementTree.pyo", line 1506, in _raiseerror
ParseError: junk after document element: line 2, column 35
Traceback (most recent call last):
  File "youtube_dl\YoutubeDL.pyo", line 671, in extract_info
  File "youtube_dl\extractor\common.pyo", line 341, in extract
  File "youtube_dl\extractor\generic.pyo", line 2024, in _real_extract
UnsupportedError: Unsupported URL: http://vitasidorkina.tumblr.com/post/132533543004/vitasidorkinalove-counting-down-the-days-till#_=_

C:\Users\Hrxn>

So, one post is working, the other one is not. But they should be the same: Single tumblr post, correct category, standard Instagram embed.

Just a guess out of the blue: The tumblr post description, i.e. that part that appears directly under the posted entry, no matter if text, image, link or video etc.

One description is:

vitasidorkinalove: Counting down the days till the show- getting my ponytail pumped 💪🏻😜 @victoriassecretsport #TrainLikeAnAngel #VSFashionShow #VitaSidorkina

The other description is:

joskriver: (AT)victoriassecret: Invisibility or flight…which superpower would YOU choose? #VSFashionShow #ThisOrThat

I am not sure, but maybe this gem here 💪🏻😜 might be the culprit. Apparently not Unicode bulletproof :smile:

Hrxn commented 8 years ago

Holy mackerel!

You guys are fast..

Thanks, lol..

By the way, one short question: This one warning, i.e.

WARNING: Falling back on generic information extractor.

is this intentional? Does it always behave like this when dealing with embeds of supported site X in supported site Y?