mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.4k stars 930 forks source link

Instagram stopped working #1149

Closed mikaljan closed 3 years ago

mikaljan commented 3 years ago

gallery-dl stop working on instagram today, i'm getting the following error:

E:\gallery-dl>gallery-dl https://www.instagram.com/migichen_/ [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CIP9dLAhkn3/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CIN3Hhwhtne/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CIDVKJshBuM/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CH-cjDIh0Tz/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CH4-mdcBlAP/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CH2itYohHD8/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CH0I8u5BWVQ/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CHxLcqxBqfe/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) . . . .

iamleot commented 3 years ago

Hello @mikaljan! Unfortunately I think this is similar to #1113 (i.e. Instagram starting being more aggressive with users that requests several images).

(I've tryed downloading the profile here - without authenticating - and it seems that I'm downloading it but I'm pretty sure I will be blocked soon.)

iamleot commented 3 years ago

...and indeed after ~2 minutes or so:

% gallery-dl -v 'https://www.instagram.com/migichen_/'                                                                                                                                    
[gallery-dl][debug] Version 1.15.4
[gallery-dl][debug] Python 3.8.6 - NetBSD-9.99.75-amd64-x86_64-64bit-ELF
[gallery-dl][debug] requests 2.24.0 - urllib3 1.25.11
[gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/migichen_/'
[instagram][debug] Using InstagramUserExtractor for 'https://www.instagram.com/migichen_/'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /migichen_/ HTTP/1.1" 200 49249
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /p/CIP9dLAhkn3/?__a=1 HTTP/1.1" 302 0
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /accounts/login/ HTTP/1.1" 200 12619
[instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CIP9dLAhkn3/':  JSONDecodeError: Expecting value: line 1 column 1 (char 0)
aeriessy commented 3 years ago

I'm also having the same issue. I used two accounts and one of them is now banned. I used a 10 second delay for sleep and sleep-request which got through maybe 50 files or something before it gave me the error. Before the account was banned, I was able to download in batches of 50 until it gave me the "something is wrong with your account, change your password" or phone verification. After doing that maybe 3 times, that account was banned outright.

Probably going to put this off until this is fixed or figured out.

UnforeseenOcean commented 3 years ago

I'm having better luck with setting the sleep time to 15, but that could change at any moment. I did get the Your Account Has Been Temporarily Locked message on my phone after using the cookie and forgetting to set the delay.

Note: If you get this error, your account might be locked. Unlock it and set the delay to something about 10 or 15 seconds longer. Oh and I'd recommend not using your primary account for this! IMG1606922048

reallyuniquename commented 3 years ago

I think Instagram extractor needs a slight rewrite. It's inefficient and with new Instagram rate limits you get stuck with first few hundrends of images at best. I explained that here https://github.com/mikf/gallery-dl/issues/1113#issuecomment-735008136.

Either that or gallery-dl have to detect ip ban and support proxy lists for quick address rotation.

iamleot commented 3 years ago

Mike writes:

I think Instagram extractor needs a slight rewrite. It's inefficient and with new Instagram rate limits you get stuck with first few hundrends of images at best. I explained that here https://github.com/mikf/gallery-dl/issues/1113#issuecomment-735008136.

Either that or gallery-dl have to detect ip ban and support proxy lists for quick address rotation.

Can you elaborate further how to make that more efficient?

At least based on how it works - and AFAIK by scraping Instagram - I think you inevitably needs to scroll all the timeline.

reallyuniquename commented 3 years ago

needs to scroll all the timeline

Correct but scrolling is querying graphql endpoint and that's like only 80 queries per 1000 images. Besides you could dump whole timeline once and keep reusing it until you download every picture.

What Instagram really doesn't like is when you start hammering /p/ABCDEFG123 pages. When rate limits hit gallery-dl has to either switch proxy or start scraping from the last downloaded image on the next run. None of that is properly supported by extractor, --range and --download-archive do not work with Instagram the way you expect it. Gallery-dl starts from the beginning of the timeline every time.

Also when I look at the log it seems that extractor just skips images it fails to download, no retries or pause. That's... not good.

mikf commented 3 years ago

Should be fixed with https://github.com/mikf/gallery-dl/commit/447488fb1876f507a6409be12864e9f22cac83e6.

Querying /p/<shortcode>/__a=1 for each post is what gets one blocked/banned, and I would highly advice against using gallery-dl versions before 1.16.0 for Instagram or any other Instagram downloaders that do this (which are pretty much all of them from what I can tell).

The rewrite is still lacking support for stories, and post listings other than the regular one (e.g. instagram.com/instagram) might not work as before, but at least it won't get you banned anymore.

dsblack commented 3 years ago

I've been having this problem for weeks, so I'm very happy to see it being addressed.

Right now, this commit isn't in a full release, so I don't get the update yet using the pip install --upgrade method. Do you know when it will be in an official release?

Also, I was afraid instagram might be taking measures to block scripts like this. But even if adding a delay (as some people have tried) helps, their next step might be to detect scripts that hit at repeating intervals -- e.g., every 10 seconds. If it's too exact, I could see them detecting that and blocking you anyway.

One thing I wrote into a homespun crawler (which checks prices for items on a web site) several years ago was a an option to randomize the delay. You give it a low bound and high bound (in seconds) -- e.g., 1 to 8, or 3 to 15 -- and each request uses a new random delay within those bounds. That way, you look much more like a human clicking through at random intervals, pausing longer at some images than others. For something like this, maybe you'd even want to have a different (longer) range for videos than for images.

What do you think, would that be a worthwhile option to add?

If you really wanted to make it easier, you could even bundle some of these options together into a "typical" group of settings under a single parameter, maybe -human. I'd definitely still allow for the individual settings, but that could make it easier to get it running successfully.

I'd be tempted to try contributing to the project myself, but I don't really know python.

kattjevfel commented 3 years ago

@dsblack

Right now, this commit isn't in a full release, so I don't get the update yet using the pip install --upgrade method. Do you know when it will be in an official release?

As listed in the readme you can do python3 -m pip install --upgrade https://github.com/mikf/gallery-dl/archive/master.tar.gz to get the latest dev version.

UnforeseenOcean commented 3 years ago

I can say for certain Instagram is looking for this kind of activity because my account got suspended (but only for the /p/ action): IMG1607079259

I will try the new version after the ban is lifted. Can't risk getting banned again!

xibr commented 3 years ago

[gallery-dl][error] No suitable extractor found for 'https://www.instagram.com/stories/et2k/2457611747557737659/'

latest dev 1.16.0-dev

phanirithvij commented 3 years ago

@xibr https://github.com/mikf/gallery-dl/issues/1149#issuecomment-738001532 says

The rewrite is still lacking support for stories, and post listings other than the regular one (e.g. instagram.com/instagram) might not work as before, but at least it won't get you banned anymore.

mikf commented 3 years ago

@xibr https://github.com/mikf/gallery-dl/commit/2b93515ee0c48b0fcf9a485a7c149985c60ed183

xibr commented 3 years ago

Now it works well with stories. Thanks

xibr commented 3 years ago

A question: When trying to download Instagram story All stories download, not a single story. Is this expected?

TestPolygon commented 3 years ago

Well, is it possible to download a part of images and save the position to continue from it on the next launch? For example, I have downloaded 1000 of 2000. Is it possible to continue from 1001 on the next launch? Currently the program performs requests for the first 1000 of images that were downloaded. Requests are performed one by one without pauses for the downloading that leads to the login page (the recheck of 1000 posts requires to perform 84 requests for a short time).

rivke41levp656 commented 3 years ago

@mikf The fullname filename field returns None on 1.16 for all users as far as I can tell.

reallyuniquename commented 3 years ago

@TestPolygon

is it possible to download a part of images and save the position to continue from it on the next launch?

You couldn't with old extractor and I don't think you can with the new one but I haven't checked that yet.

Try that yourself, you are looking for options -v --range 1000- and -v --download-archive history.sqlite.

TestPolygon commented 3 years ago

SQLite DB stores only node IDs, so it can be used only to check (if --range exists) the node with certain ID was downloaded or not. By default it checks the location where files would be downloaded and compares the expected filename with names of files are in this directory.

--verbose was useful to debug. I can say that it is possible to do.

It requires to add, for example --session flag.

With this flag the program should store (in a system file) the current parameters that are required for requesting the next "list page" with accociated url. For example: [{ur1: [param1, param2]}, {ur2: [param1, param2]}]. And use them if they are presented in this file to continue downloading from a certain possition. (If a user has interrupted the downloding via Ctrl+C (for this case it needed to store the params for requesting the current "list page" too), or he was faced with API limit exceed ("login page") when he has requested the next "list page".

A more complicated format example: [{ url1: { current: { params: [], fullyDownloaded: false }, next: { params: [] }, date: 1607609281 } }]

For instagram it are: tracking_token, query_hash and id.

@mikf ?

mikaljan commented 3 years ago

Hi @mikf,

I tried the latest 1.16.0-dev version, and I would get some successful downloads in the beginning, and after a minute or so everything returns a warning, please check the TXT file I've attached:

instagram_log.txt

mikf commented 3 years ago

@mikaljan This output isn't from the latest dev version. The Unable to fetch data from ... logging message was removed in the rewrite (https://github.com/mikf/gallery-dl/commit/447488fb1876f507a6409be12864e9f22cac83e6). Check gallery-dl --version to make sure you are actually using 1.16.0-dev. I'll release a new version with the fix this weekend. You could just wait until then.

@TestPolygon https://github.com/mikf/gallery-dl/commit/b88c97b873fe3bd07ffe6651801e121a20938cb1 adds a way to at least manually input a cursor value and continue downloading from the current position. The cursor tokens get outputted as debug logging messages or when getting redirected to the login page.

This commit also increases the amount of requested posts per GraphQL from 12 to 50 (the maximum possible). Since the redirect to login page for not logged in users always happens after ~120 requests regardless of how many posts get fetched or how long of a wait time there is in between, this should allow for more posts to get downloaded.

TestPolygon commented 3 years ago

Hm, I used pip install --no-cache-dir --upgrade https://github.com/mikf/gallery-dl/archive/master.tar.gz, but I still have the old behavior ("first":+12 and no promt "Use '-o cursor=%s' to continue downloading " on the login page event)


Upd: use pip unistall gallery-dl

syntopikon commented 3 years ago

I was experiencing this error previously as well, but after upgrading to 1.16.0, I've yet to encounter it (working across several 2k+ mixed albums).

mikf commented 3 years ago

As omnicr0n said, v1.16.0 is out, which should at least somewhat mitigate any rate limit problems with Instagram.

@xibr this is expected and worked like that even before the rewrite. If you want to limit the download to only a specific story ID, use --filter "media_id == 'STORY ID'"

@rivke41levp656 Instagram removed those from all owner fields, it seems. This has nothing directly to do with the rewrite from https://github.com/mikf/gallery-dl/commit/447488fb1876f507a6409be12864e9f22cac83e6. The fullname info was still available a month ago, but now the embedded data in user profile pages like https://www.instagram.com/instagram/ only has "owner":{"id":"25025320","username":"instagram"}

mikf commented 3 years ago

@TestPolygon

$ pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz

should work without needing to uninstall. (I've updated the instructions in the README accordingly)

xibr commented 3 years ago

@xibr this is expected and worked like that even before the rewrite. If you want to limit the download to only a specific story ID, use --filter "media_id == 'STORY ID'"

got it, thanks.

left1000 commented 3 years ago

So, instagram works, again, yeah! (at least on public follows).

Unfortunately it doesn't work for private accounts (that my account has access to), even having provided instagram with my username/password in the conf file... and I'm fairly sure I did it right because, well, it used to work just fine.

Hrxn commented 3 years ago

Does it work if you remove username/password authentication and try it with the exported cookies instead?

mikf commented 3 years ago

Forcing a re-login by clearing your cache with gallery-dl --clear-cache and then trying to download from Instagram again might also work.