ytdl-org / youtube-dl

Command-line program to download videos from YouTube.com and other video sites
http://ytdl-org.github.io/youtube-dl/
The Unlicense
131.18k stars 9.93k forks source link

Unable to download from Instagram #31322

Open zensubz opened 1 year ago

zensubz commented 1 year ago

Checklist

Verbose log

$ youtube-dl https://www.instagram.com/p/CSZ64OTlsVK/
[Instagram] CSZ64OTlsVK: Downloading webpage
WARNING: unable to extract uploader id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/bin/youtube-dl", line 11, in <module>
    load_entry_point('youtube-dl==2021.12.17', 'console_scripts', 'youtube-dl')()
  File "/usr/lib/python3.6/site-packages/youtube_dl/__init__.py", line 474, in main
    _real_main(argv)
  File "/usr/lib/python3.6/site-packages/youtube_dl/__init__.py", line 464, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/lib/python3.6/site-packages/youtube_dl/YoutubeDL.py", line 2080, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/usr/lib/python3.6/site-packages/youtube_dl/YoutubeDL.py", line 808, in extract_info
    return self.__extract_info(url, ie, download, extra_info, process)
  File "/usr/lib/python3.6/site-packages/youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/youtube_dl/YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python3.6/site-packages/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/lib/python3.6/site-packages/youtube_dl/extractor/instagram.py", line 266, in _real_extract
    'title': title or 'Video by %s' % uploader_id,
UnboundLocalError: local variable 'title' referenced before assignment

Description

I am using the git commit 502cefa of youtube-dl.

Please fix this issue, thanks in advance.

dirkf commented 1 year ago

I can reproduce this.

The yt-dlp extractor was recently modified, and that gives a better result. A quick back-port to yt-dl, without checking the tests, works the same way:

$ python -m youtube_dl -v -F 'https://www.instagram.com/p/CSZ64OTlsVK/'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://www.instagram.com/p/CSZ64OTlsVK/']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: fdb16c0d6
[debug] Python version 2.7.17 (CPython) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[Instagram] CSZ64OTlsVK: Setting up session
WARNING: [Instagram] CSZ64OTlsVK: No csrf token set by Instagram API
[Instagram] CSZ64OTlsVK: Downloading JSON metadata
[info] Available formats for CSZ64OTlsVK:
format code  extension  resolution note
0            mp4        640x640    
$

This needs some utility functions to be back-ported too, which is work in progress. The extractor matches the Insta web experience in that if you aren't logged in you're liable to blocked, at least after the first retrieval in a day.

dirkf commented 1 year ago

This may also be of interest:

$ python -m youtube_dl -g 'https://www.instagram.com/p/CSZ64OTlsVK/'
WARNING: [Instagram] CSZ64OTlsVK: No csrf token set by Instagram API
https://instagram.flhr10-2.fna.fbcdn.net/v/t50.2886-16/236535957_1039163236909735_4308037074433982370_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InZ0c192b2RfdXJsZ2VuLjY0MC5mZWVkLmRlZmF1bHQiLCJxZV9ncm91cHMiOiJbXCJpZ193ZWJfZGVsaXZlcnlfdnRzX290ZlwiXSJ9&_nc_ht=instagram.flhr10-2.fna.fbcdn.net&_nc_cat=107&_nc_ohc=xlSRk7t68AwAX9TWVc3&edm=AP_V10EBAAAA&vs=17921425339801214_3645058679&_nc_vs=HBksFQAYJEdKVkFHUTZuZ3BrTEhiRURBS0stMmV6a09Nazdia1lMQUFBRhUAAsgBABUAGCRHQklEQ1E3eE9BeEpHblFEQUpLdjlua1ByT29zYmtZTEFBQUYVAgLIAQAoABgAGwAVAAAm8JqT79%2BAxEAVAigCQzMsF0BBVT987ZFoGBJkYXNoX2Jhc2VsaW5lXzFfdjERAHXqBwA%3D&_nc_rid=4ab33ccb04&ccb=7-5&oh=00_AfD_cDiYSo_IFbg00yEZfghEahqcrvKwNtU-XNeqBHaevA&oe=636380A6&_nc_sid=4f375e
$
zensubz commented 1 year ago

Thanks @dirkf for the great work on youtube-dl. The R&D part is beyond my reach, so I'll wait for the fix from your side.

llevrel commented 1 year ago

Hello,

This report seems to be the most recent about Instagram. Hoping the update will land soon. Will it be able to use a cookies file?

@zensubz if this can help, I've successfully used the workaround here https://github.com/ytdl-org/youtube-dl/issues/25354#issuecomment-634919136 (example):

dirkf commented 1 year ago

If you have a cookies file from a logged-in browser session with Insta, I expect that specifying it with --cookies ... will cause yt-dl to be treated by Insta as if logged-in. Some sites are fussier though, requiring (eg) the same UA header to be sent. We'll find out.

llevrel commented 1 year ago

Thanks for your answer and time. I had tried it with the "cookies.txt" Firefox add-on, it didn't work. (I had exported "this site", may there be cross-site cookies necessary?)

dirkf commented 1 year ago

Do you mean with yt-dlp? No public version of the yt-dl Instagram extractor is up-to-date and possibly none is now working.

It's possible that Insta uses Facebook domains like fbcdn.net. I would set up a new profile, or maybe a private window, to log in to Insta, and export all the cookies from that session.

llevrel commented 1 year ago

No, I don't even know what the -dlp version is nor where to get it XD Thank you for the clarification. I just wanted to mention my trial and wondered whether I did it wrong :-)

ErraticFox commented 1 year ago

You can download from instagram using youtube-dlp -vU https://www.instagram.com/p/CTomrHIFc36/embed/

Things to note:

A) if it's a reels link, you must replace reels with p (i.e. https://www.instagram.com/reels/CTomrHIFc36 > https://www.instagram.com/p/CTomrHIFc36)

B) Adding embed to the end seems to by pass the login screen.

I downloaded the video in the example 10+ times back-to-back/consecutively and never hit a "shadow ban"

@dirkf could you confirm this on your end as well? Thanks!

P.S. I found this information from this pull.

dirkf commented 1 year ago

I expect that the yt-dlp PR should be back-ported.

ErraticFox commented 1 year ago

I expect that the yt-dlp PR should be back-ported.

I am so sorry about that post. I was bouncing back and forth between the two repos and when I found a solution, I forgot that this issue was a dl and not a dlp. It just dawned on me that I might've done that and I came back to see that's in fact what I did.

I still will have to use dl as the facebook parser for dlp when using cookies is broken.

ghost commented 1 year ago

The extractor matches the Insta web experience in that if you aren't logged in you're liable to blocked, at least after the first retrieval in a day.

FYI this might not be the case anymore - I just hit a https://www.instagram.com/p/SOMETHING 16 times with one second delay each, and didn't get any blocks. Previously I would usually get blocked after about 4 anonymous requests. So maybe they have finally relaxed the previous ridiculous rate limits. I might restore my old code if thats the case.