mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.64k stars 950 forks source link

Instagram extractor error #391

Closed debagos closed 4 years ago

debagos commented 5 years ago

It looks like Facebook had changed the Instagram profile page. I get a graphql key-error all the time...

[gallery-dl][debug] Version 1.10.1
[gallery-dl][debug] Python 3.6.8 - Linux-4.18.0-25-generic-x86_64-with-Ubuntu-18.10-cosmic
[gallery-dl][debug] requests 2.22.0 - urllib3 1.22
[1/3] https://www.instagram.com/REDACTED/
[gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/REDACTED/'
[gallery-dl][debug] Updating urllib3 ciphers
[instagram][debug] Using InstagramUserExtractor for 'https://www.instagram.com/REDACTED/'
[instagram][info] Logging in as REDACTED
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /accounts/login/ HTTP/1.1" 200 9511
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /web/__mid/ HTTP/1.1" 200 28
[urllib3.connectionpool][debug] https://www.instagram.com:443 "POST /accounts/login/ajax/ HTTP/1.1" 200 296
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /REDACTED/ HTTP/1.1" 200 None
[instagram][error] An unexpected error occurred: KeyError - 'graphql'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[instagram][debug] 
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/gallery_dl/job.py", line 47, in run
    for msg in self.extractor:
  File "/usr/local/lib/python3.6/dist-packages/gallery_dl/extractor/instagram.py", line 36, in items
    for data in self.instagrams():
  File "/usr/local/lib/python3.6/dist-packages/gallery_dl/extractor/instagram.py", line 205, in _extract_profilepage
    yield from self._extract_page(url, 'ProfilePage')
  File "/usr/local/lib/python3.6/dist-packages/gallery_dl/extractor/instagram.py", line 169, in _extract_page
    base_shared_data = shared_data['entry_data'][page_type][0]['graphql']
KeyError: 'graphql'
[2/3] [...]

Thank you for fixing, wish you a great day, yours sincerely.

iamleot commented 5 years ago

Hello debagos,

debagos writes:

It looks like Facebook had changed the Instagram profile page. I get a graphql key-error all the time...

[gallery-dl][debug] Version 1.10.1
[gallery-dl][debug] Python 3.6.8 - Linux-4.18.0-25-generic-x86_64-with-Ubuntu-18.10-cosmic
[gallery-dl][debug] requests 2.22.0 - urllib3 1.22
[1/3] https://www.instagram.com/REDACTED/
[gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/REDACTED/'
[gallery-dl][debug] Updating urllib3 ciphers
[instagram][debug] Using InstagramUserExtractor for 'https://www.instagram.com/REDACTED/'
[instagram][info] Logging in as REDACTED
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /accounts/login/ HTTP/1.1" 200 9511
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /web/__mid/ HTTP/1.1" 200 28
[urllib3.connectionpool][debug] https://www.instagram.com:443 "POST /accounts/login/ajax/ HTTP/1.1" 200 296
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /REDACTED/ HTTP/1.1" 200 None
[instagram][error] An unexpected error occurred: KeyError - 'graphql'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[instagram][debug] 
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/gallery_dl/job.py", line 47, in run
    for msg in self.extractor:
  File "/usr/local/lib/python3.6/dist-packages/gallery_dl/extractor/instagram.py", line 36, in items
    for data in self.instagrams():
  File "/usr/local/lib/python3.6/dist-packages/gallery_dl/extractor/instagram.py", line 205, in _extract_profilepage
    yield from self._extract_page(url, 'ProfilePage')
  File "/usr/local/lib/python3.6/dist-packages/gallery_dl/extractor/instagram.py", line 169, in _extract_page
    base_shared_data = shared_data['entry_data'][page_type][0]['graphql']
KeyError: 'graphql'
[2/3] [...]

Thank you for fixing, wish you a great day, yours sincerely.

JFTR, at least public profiles seems to work (if also a public profile is problematic please share a possible non-redacted URL to reproduce this issue).

If noone beat me I'll try to investigate further later this UTC evening if I can find a private profile.

Thanks!

iamleot commented 5 years ago

Leonardo Taccari writes:

[...]

JFTR, at least public profiles seems to work (if also a public profile is problematic please share a possible non-redacted URL to reproduce this issue).

If noone beat me I'll try to investigate further later this UTC evening if I can find a private profile. [...]

I couldn't reproduce it neither with a private profile (I have tried both gallery-dl 1.10.1 and latest Git HEAD (on NetBSD/evbarm and Python 3.7, but probably that's not important)). Can you please share more information?

At least by relooking at the verbose output


[gallery-dl][debug] Version 1.10.1
[gallery-dl][debug] Python 3.6.8 - Linux-4.18.0-25-generic-x86_64-with-Ubuntu-18.10-cosmic
[gallery-dl][debug] requests 2.22.0 - urllib3 1.22
[1/3] https://www.instagram.com/REDACTED/
[gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/REDACTED/'
[gallery-dl][debug] Updating urllib3 ciphers
[instagram][debug] Using InstagramUserExtractor for 'https://www.instagram.com/REDACTED/'
[instagram][info] Logging in as REDACTED
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /accounts/login/ HTTP/1.1" 200 9511
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /web/__mid/ HTTP/1.1" 200 28
[urllib3.connectionpool][debug] https://www.instagram.com:443 "POST /accounts/login/ajax/ HTTP/1.1" 200 296
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /REDACTED/ HTTP/1.1" 200 None

The `None' is unespected, i.e. getting the profile account should return a response with several data.

I would expect as [.../3] something like:

[1/3] user: [...] [2/3] email: [3/3] https://www.instagram.com/ [gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/' [instagram][debug] Using InstagramUserExtractor for 'https://www.instagram.com/' [instagram][info] Logging in as [urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /accounts/login/ HTTP/1.1" 200 9503 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /web/__mid/ HTTP/1.1" 200 28 [urllib3.connectionpool][debug] https://www.instagram.com:443 "POST /accounts/login/ajax/ HTTP/1.1" 200 296 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET // HTTP/1.1" 200 18517

(That's when invoking gallery-dl as `gallery-dl -u -p https://www.instagram.com/')

debagos commented 5 years ago

Actually it doesn't matter if public or private profile... I started the same downloads again without authentication towards Instagram and it worked. So maybe the extractor isn't causing the problem here. I reckon that the problem is cause by my password. It contains a apostrophe and it was easier to use a config which contains the username and password, than escaping the apostrophe successfully. That's why I don't use the -u <your_username> -p <your_password> method. I use --config <path> instead. My method worked fine for weeks, but now it seems like I'm not logged in anymore through Gallery-DL...

Edit: Is there a way to save a copy from the fetched document? Maybe that can tell use more about whats going on here...

iamleot commented 5 years ago

debagos writes:

Actually it doesn't matter if public or private profile... I started the same downloads again without authentication towards Instagram and it worked. So maybe the extractor isn't causing the problem here. I reckon that the problem is cause by my password. It contains a apostrophe and it was easier to use a config which contains the username and password, than escaping the apostrophe successfully. That's why I don't use the -u <your_username> -p <your_password> method. I use --config <path> instead. My method worked fine for weeks, but now it seems like I'm not logged in anymore through Gallery-DL...

Can you please retry to login again via the web browser and then retry to gallery-dl a profile as authenticated user?

At least after a couple of logins it seems that - when logging via the web browser - Instagram asks for a verification code that is sent via email and then should be filled in the form when logging in.

I have never hit that via gallery-dl but this could explain the problem you are seeing (that's just a wild guess though without inspecting the responses).

debagos commented 5 years ago

I created a local copy of this repo and now I'm fiddling around, trying to find the cause... I'm definitively logged in, but the extractor fails at

if 'entry_data' in shared_data:
                base_shared_data = shared_data['entry_data'][psdf['page']][0]['graphql']

in extractor/instagram.py I will report back if I can fix it.

mikf commented 5 years ago

@debagos Did you manage to find anything? Does this error still exist?

If it does, could you add

from .. import util
util.dump_json(shared_data)
exit()

after https://github.com/mikf/gallery-dl/blob/23251356cbc06d8d2477ea34e3e2fe4ed2f99c9e/gallery_dl/extractor/instagram.py#L93 and post the output here? (Maybe use pastebin or similar if its too long) The contents of page might also be interesting.

debagos commented 5 years ago

Sorry, I'm pretty busy at the moment... The problem still persists (v.1.10.3) and I did what you suggested @mikf. Thank you.

My assumption is that I am part of a canary/experimental group which gets a newer Instagram layout. My knowledge about python (or programming in general) is very low, so I am not able to resolve this problem by myself. Even if I post my page content here, what about the other people with that old Instagram layout? I think the extractor will get pretty complex... What do you guys think, do you want to investigate further into this very specific problem or should we just wait and drink tea?

mikf commented 5 years ago

Thank you for the detailed response!

what about the other people with that old Instagram layout?

This would be handled by first checking if it's the "old" layout, i.e. if there is a graphql field in the initial shared_data, and otherwise it would switch to grabbing the data from window.__additionalDataLoaded or something like that. Shouldn't be very complicated.

What do you guys think, do you want to investigate further into this very specific problem

Yes, I would really like to see in how your Instagram (data) layout differs from a "normal" one, so this can hopefully be fixed. You also don't have to post the contents of page with your personal data out in the open. Sending an email or a PM on Gitter is a possibility as well.

github-userx commented 5 years ago

Maybe related: https://github.com/instaloader/instaloader/issues/394

ghost commented 4 years ago

So in the last few days I have recently been getting this graphql error. I have normally been able to download public and private profiles while logged in but it seems Instagram has changed something on their end? Perhaps the rollout of the new dark theme within their app? I'm not the best with coding so not quite sure what went wrong but have pre configured the .conf file with the correct details in my /etc directory as per the defaults.

Commands typed in to the terminal

gallery-dl --sleep 02 https://www.instagram.com/REDACTED/ 

Below is the output.

[gallery-dl][debug] Version 1.10.6
[gallery-dl][debug] Python 3.5.2 - Linux-5.0.0-32-generic-x86_64-with-Ubuntu-18.04-bionic
[gallery-dl][debug] requests 2.22.0 - urllib3 1.25.6
[gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/REDACTED/'
[gallery-dl][debug] Updating urllib3 ciphers
[instagram][debug] Using InstagramUserExtractor for 'https://www.instagram.com/REDACTED/'
[instagram][info] Logging in as REDACTED
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /accounts/login/ HTTP/1.1" 200 9969
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /web/__mid/ HTTP/1.1" 200 28
[urllib3.connectionpool][debug] https://www.instagram.com:443 "POST /accounts/login/ajax/ HTTP/1.1" 200 412
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /REDACTED/ HTTP/1.1" 200 18597
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /p/REDACTED/ HTTP/1.1" 200 None
[instagram][error] An unexpected error occurred: KeyError - 'graphql'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[instagram][debug] 
Traceback (most recent call last):
  File "/snap/gallery-dl/865/lib/python3.5/site-packages/gallery_dl/job.py", line 47, in run
    for msg in self.extractor:
  File "/snap/gallery-dl/865/lib/python3.5/site-packages/gallery_dl/extractor/instagram.py", line 35, in items
    for data in self.instagrams():
  File "/snap/gallery-dl/865/lib/python3.5/site-packages/gallery_dl/extractor/instagram.py", line 427, in instagrams
    'query_hash': 'f2405b236d85e8296cf30347c9f08c2a',
  File "/snap/gallery-dl/865/lib/python3.5/site-packages/gallery_dl/extractor/instagram.py", line 269, in _extract_page
    yield from self._extract_postpage(url)
  File "/snap/gallery-dl/865/lib/python3.5/site-packages/gallery_dl/extractor/instagram.py", line 109, in _extract_postpage
    media = shared_data['entry_data']['PostPage'][0]['graphql']['shortcode_media']
KeyError: 'graphql'
mikf commented 4 years ago

My own account now also has the new "layout" for Post pages it seems, and I've managed to implement a fix (https://github.com/mikf/gallery-dl/commit/5fa6ff04ddf1ef9145233237c635cce93b3a8687). But, as the commit message says, video downloads when logged in no longer work. Disabling downloader.ytdl.forward-cookies works around that for public videos, but private videos aren't downloadable any more.