mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.81k stars 968 forks source link

[instagram] Private account user timelines not supported #195

Closed turbomund closed 5 years ago

turbomund commented 5 years ago

gallery-dl -v -d "D:\Div M\Pictures" https://www.instagram.com/adult.x.entertainment/ [gallery-dl][debug] Version 1.8.0 [gallery-dl][debug] Python 3.6.5 - Windows-10-10.0.17134-SP0 [gallery-dl][debug] requests 2.21.0 - urllib3 1.24.1 [gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/adult.x.entertainment/' [instagram][debug] Using InstagramUserExtractor for 'https://www.instagram.com/adult.x.entertainment/' [urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /adult.x.entertainment/ HTTP/1.1" 200 9916 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /graphql/query/?query_hash=66eb9403e44cc12e5b5ecda48b667d41&variables=%7B%22id%22:%227837666884%22,%22first%22:12,%22after%22:%22QVFCblBzeGM3Qm1YbFg1LUMtWC1BZko1MUNqcHJqTm83QUhtSEJ1OXFBSk1Uc01DaEgtalE1ZzluTHlUMTVSOVNRMlFObXZBMTFMLU1nMmc5cU5CY3JibQ==%22%7D HTTP/1.1" 200 239 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /graphql/query/?query_hash=66eb9403e44cc12e5b5ecda48b667d41&variables=%7B%22id%22:%227837666884%22,%22first%22:12,%22after%22:%22QVFDN2RoUnNGQ3VwalZyUWZlWTRGc3VWcDZ3QWpsTXBrWXE1aGxUNTZ2SWRlbk04aWwzMXEtQmJYTEVBMjdoRXRNRzU4cEE1aG5NbjVqaHRqVEQ4SkhGVQ==%22%7D HTTP/1.1" 200 143

iamleot commented 5 years ago

Hello Sejer,

Sejer writes:

gallery-dl -v -d "D:\Div M\Pictures" https://www.instagram.com/adult.x.entertainment/ [gallery-dl][debug] Version 1.8.0 [gallery-dl][debug] Python 3.6.5 - Windows-10-10.0.17134-SP0 [gallery-dl][debug] requests 2.21.0 - urllib3 1.24.1 [gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/adult.x.entertainment/' [instagram][debug] Using InstagramUserExtractor for 'https://www.instagram.com/adult.x.entertainment/' [urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /adult.x.entertainment/ HTTP/1.1" 200 9916 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /graphql/query/?query_hash=66eb9403e44cc12e5b5ecda48b667d41&variables=%7B%22id%22:%227837666884%22,%22first%22:12,%22after%22:%22QVFCblBzeGM3Qm1YbFg1LUMtWC1BZko1MUNqcHJqTm83QUhtSEJ1OXFBSk1Uc01DaEgtalE1ZzluTHlUMTVSOVNRMlFObXZBMTFMLU1nMmc5cU5CY3JibQ==%22%7D HTTP/1.1" 200 239 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /graphql/query/?query_hash=66eb9403e44cc12e5b5ecda48b667d41&variables=%7B%22id%22:%227837666884%22,%22first%22:12,%22after%22:%22QVFDN2RoUnNGQ3VwalZyUWZlWTRGc3VWcDZ3QWpsTXBrWXE1aGxUNTZ2SWRlbk04aWwzMXEtQmJYTEVBMjdoRXRNRzU4cEE1aG5NbjVqaHRqVEQ4SkhGVQ==%22%7D HTTP/1.1" 200 143

It seems that the account is private and no public media are accessible.

(ATM the Instagram extractor do not support possible authenticated requests.)

iamleot commented 5 years ago

Sejer, if you can, please change the title to [instagram] Private account user timelines not supported to better reflect the problem! :) (probably also the single page has the same problems... it should be quick to check by opening a private account on a browser, going to a media and then trying to pass to gallery-dl the URL with the shortcode_id, e.g. https://www.instagram.com/p/<shortcode_id>)

Thank you!

KaMyKaSii commented 5 years ago

I also want to download private accounts from Instagram. @mikf will you add support for this?

iamleot commented 5 years ago

Hello KaMyKaSii,

KaMyKaSii writes:

I also want to download private accounts from Instagram. @mikf will you add support for this?

It would be nice and probably will just steal 30-60 seconds trying @mikf's suggestion of #214:

Log into your twitter account in a web browser, export cookies to a cookies.txt file, and load them with --cookies.

I'll see if I can get "native" Twitter login support with -u and -p up and running, but using exported cookies has the same effect as being logged in and also works.

Can you please try it and let us know if it does the trick?

KaMyKaSii commented 5 years ago

Hello KaMyKaSii, KaMyKaSii writes: I also want to download private accounts from Instagram. @mikf will you add support for this? It would be nice and probably will just steal 30-60 seconds trying @mikf's suggestion of #214: Log into your twitter account in a web browser, export cookies to a cookies.txt file, and load them with --cookies. I'll see if I can get "native" Twitter login support with -u and -p up and running, but using exported cookies has the same effect as being logged in and also works. Can you please try it and let us know if it does the trick?

Thanks for letting me know about cookies. I exported them using the cookies.txt extension but an error occurs when trying to use them in gallery-dl: $ gallery-dl --verbose --cookies /sdcard/cookies.txt https://instagram.com/censored [gallery-dl][debug] Version 1.8.2-dev [gallery-dl][debug] Python 3.7.3 - Linux-4.4.78-perf+-aarch64-with-libc [gallery-dl][debug] requests 2.20.1 - urllib3 1.24.1 [gallery-dl][debug] Starting DownloadJob for 'https://instagram.com/censored' [instagram][debug] Using InstagramUserExtractor for 'https://instagram.com/censored' [urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /censored/ HTTP/1.1" 200 16691 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /p/Bv2DboRg4bJ-Zq0pwUDkXMXrLnCyR-_EZOdThY0/ HTTP/1.1" 200 15583 [instagram][error] An unexpected error occurred: KeyError - 'edge_media_to_comment'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues . [instagram][debug] Traceback (most recent call last): File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/gallery_dl/job.py", line 54, in run for msg in self.extractor: File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/gallery_dl/extractor/instagram.py", line 32, in items for data in self.instagrams(): File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/gallery_dl/extractor/instagram.py", line 159, in _extract_profilepage yield from self._extract_page(url, 'ProfilePage') File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/gallery_dl/extractor/instagram.py", line 135, in _extract_page yield from self._extract_postpage(url) File "/data/data/com.termux/files/usr/lib/python3.7/site-packages/gallery_dl/extractor/instagram.py", line 53, in _extract_postpage 'comments': text.parse_int(media['edge_media_to_comment']['count']), KeyError: 'edge_media_to_comment' $

iamleot commented 5 years ago

KaMyKaSii writes:

Thanks for letting me know about cookies. Export them using the cookies.txt extension but an error occurs when trying to use them in gallery-dl: `$ gallery-dl --verbose --cookies /sdcard/cookies.txt https://instagram.com/censored [gallery-dl][debug] Version 1.8.2-dev [gallery-dl][debug] Python 3.7.3 - Linux-4.4.78-perf+-aarch64-with-libc [gallery-dl][debug] requests 2.20.1 - urllib3 1.24.1 [gallery-dl][debug] Starting DownloadJob for 'https://instagram.com/censored' [instagram][debug] Using InstagramUserExtractor for 'https://instagram.com/censored' [urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /censored/ HTTP/1.1" 200 16691 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /p/Bv2DboRg4bJ-Zq0pwUDkXMXrLnCyR-_EZOdThY0/ HTTP/1.1" 200 15583 [instagram][error] An unexpected error occurred: KeyError - 'edge_media_to_comment'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .

I think this is maybe due an unsupported mediatype and/or structure of sharedData that is only available for private accounts... but that's just a wild guess! (anyway, the /p// of the log seems unusual to the ones available to non-private-accounts! (seems like three shortcodes separated by a `-'))

I would be curious about the shared_data used in that case and I think that _extract_postpage() needs to be adjusted.

If you have access to such private media type - that can be reshared

The following should be enough to collect such information, e.g.:

% env PYTHONPATH=. python3.7 -m pdb -m gallery_dl --verbose --cookies /sdcard/cookies.txt https://instagram.com/censored [...] (Pdb) b gallery_dl/extractor/instagram.py:50 Breakpoint 1 at .../gallery_dl/extractor/instagram.py:50 (Pdb) c [...] (Pdb) n [...] (Pdb) print(media)

(JFTR, line 50 of instagram.py extractor is _extract_postpage(), media' initialization (after fetching and populatingshared_data').)

WARNING: Maybe possible sensitive information can be contained in media and please omit them by e.g. replacing them via `***' or similar! I am mainly curious about all keywoards accessed in _extract_postpage() that are present and the ones that are not present.

Thanks!

mikf commented 5 years ago

a9c89085fb9f79bb62748408761af45f272bac5e adds login support with -u/--username and -p/--password for Instagram. It doesn't support any "special" cases like 2FA, but I was able to log into my own account and download stuff from https://www.instagram.com/adult.x.entertainment/. The cookie method suggested by iamleot also worked. The only necessary cookie appears to be sessionid, which has a lifetime of at least 360 days.

The error from https://github.com/mikf/gallery-dl/issues/195#issuecomment-480523688 is most likely related to missing edge_media_to_comment fields which has been fixed some time ago in https://github.com/mikf/gallery-dl/pull/250.

KaMyKaSii commented 5 years ago

a9c8908 adds login support with -u/--username and -p/--password for Instagram. It doesn't support any "special" cases like 2FA, but I was able to log into my own account and download stuff from https://www.instagram.com/adult.x.entertainment/. The cookie method suggested by iamleot also worked. The only necessary cookie appears to be sessionid, which has a lifetime of at least 360 days.

The error from #195 (comment) is most likely related to missing edge_media_to_comment fields which has been fixed some time ago in #250.

Yes, after this commit I was able to download using the cookies file. But now with authentication support, I wonder if it is possible for accounts with Facebook login? I tried both with my Instagram username and Facebook email but could not authenticate

mikf commented 5 years ago

But now with authentication support, I wonder if it is possible for accounts with Facebook login?

I used the "Log in with Facebook" button on Instagram's main page to initially create an Instagram account without having to input my phone number, and then used that to re-create what happens when the "Log in" button on https://www.instagram.com/accounts/login/ is clicked. So whatever username & password can be used to log directly into Instagram should also work with gallery-dl.

Implementing a round-trip over to Facebook is a bit much, especially since providing cookies also works. Speaking of: you don't need to use a separate cookies file. It is also possible to put them directly in your config file:

...
        "instagram": {
            "cookies": {
                "sessionid": "...",
                "mid"      : "...",
                "csrftoken": "..."
            }
        },
...