Closed mikaljan closed 3 years ago
Hello @mikaljan! Unfortunately I think this is similar to #1113 (i.e. Instagram starting being more aggressive with users that requests several images).
(I've tryed downloading the profile here - without authenticating - and it seems that I'm downloading it but I'm pretty sure I will be blocked soon.)
...and indeed after ~2 minutes or so:
% gallery-dl -v 'https://www.instagram.com/migichen_/'
[gallery-dl][debug] Version 1.15.4
[gallery-dl][debug] Python 3.8.6 - NetBSD-9.99.75-amd64-x86_64-64bit-ELF
[gallery-dl][debug] requests 2.24.0 - urllib3 1.25.11
[gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/migichen_/'
[instagram][debug] Using InstagramUserExtractor for 'https://www.instagram.com/migichen_/'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /migichen_/ HTTP/1.1" 200 49249
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /p/CIP9dLAhkn3/?__a=1 HTTP/1.1" 302 0
[urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /accounts/login/ HTTP/1.1" 200 12619
[instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CIP9dLAhkn3/': JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I'm also having the same issue. I used two accounts and one of them is now banned. I used a 10 second delay for sleep and sleep-request which got through maybe 50 files or something before it gave me the error. Before the account was banned, I was able to download in batches of 50 until it gave me the "something is wrong with your account, change your password" or phone verification. After doing that maybe 3 times, that account was banned outright.
Probably going to put this off until this is fixed or figured out.
I'm having better luck with setting the sleep time to 15, but that could change at any moment. I did get the Your Account Has Been Temporarily Locked message on my phone after using the cookie and forgetting to set the delay.
Note: If you get this error, your account might be locked. Unlock it and set the delay to something about 10 or 15 seconds longer. Oh and I'd recommend not using your primary account for this!
I think Instagram extractor needs a slight rewrite. It's inefficient and with new Instagram rate limits you get stuck with first few hundrends of images at best. I explained that here https://github.com/mikf/gallery-dl/issues/1113#issuecomment-735008136.
Either that or gallery-dl have to detect ip ban and support proxy lists for quick address rotation.
Mike writes:
I think Instagram extractor needs a slight rewrite. It's inefficient and with new Instagram rate limits you get stuck with first few hundrends of images at best. I explained that here https://github.com/mikf/gallery-dl/issues/1113#issuecomment-735008136.
Either that or gallery-dl have to detect ip ban and support proxy lists for quick address rotation.
Can you elaborate further how to make that more efficient?
At least based on how it works - and AFAIK by scraping Instagram - I think you inevitably needs to scroll all the timeline.
needs to scroll all the timeline
Correct but scrolling is querying graphql endpoint and that's like only 80 queries per 1000 images. Besides you could dump whole timeline once and keep reusing it until you download every picture.
What Instagram really doesn't like is when you start hammering /p/ABCDEFG123
pages. When rate limits hit gallery-dl has to either switch proxy or start scraping from the last downloaded image on the next run. None of that is properly supported by extractor, --range
and --download-archive
do not work with Instagram the way you expect it. Gallery-dl starts from the beginning of the timeline every time.
Also when I look at the log it seems that extractor just skips images it fails to download, no retries or pause. That's... not good.
Should be fixed with https://github.com/mikf/gallery-dl/commit/447488fb1876f507a6409be12864e9f22cac83e6.
Querying /p/<shortcode>/__a=1
for each post is what gets one blocked/banned, and I would highly advice against using gallery-dl versions before 1.16.0 for Instagram or any other Instagram downloaders that do this (which are pretty much all of them from what I can tell).
The rewrite is still lacking support for stories, and post listings other than the regular one (e.g. instagram.com/instagram
) might not work as before, but at least it won't get you banned anymore.
I've been having this problem for weeks, so I'm very happy to see it being addressed.
Right now, this commit isn't in a full release, so I don't get the update yet using the pip install --upgrade
method. Do you know when it will be in an official release?
Also, I was afraid instagram might be taking measures to block scripts like this. But even if adding a delay (as some people have tried) helps, their next step might be to detect scripts that hit at repeating intervals -- e.g., every 10 seconds. If it's too exact, I could see them detecting that and blocking you anyway.
One thing I wrote into a homespun crawler (which checks prices for items on a web site) several years ago was a an option to randomize the delay. You give it a low bound and high bound (in seconds) -- e.g., 1 to 8, or 3 to 15 -- and each request uses a new random delay within those bounds. That way, you look much more like a human clicking through at random intervals, pausing longer at some images than others. For something like this, maybe you'd even want to have a different (longer) range for videos than for images.
What do you think, would that be a worthwhile option to add?
If you really wanted to make it easier, you could even bundle some of these options together into a "typical" group of settings under a single parameter, maybe -human
. I'd definitely still allow for the individual settings, but that could make it easier to get it running successfully.
I'd be tempted to try contributing to the project myself, but I don't really know python.
@dsblack
Right now, this commit isn't in a full release, so I don't get the update yet using the
pip install --upgrade
method. Do you know when it will be in an official release?
As listed in the readme you can do python3 -m pip install --upgrade https://github.com/mikf/gallery-dl/archive/master.tar.gz
to get the latest dev version.
I can say for certain Instagram is looking for this kind of activity because my account got suspended (but only for the /p/
action):
I will try the new version after the ban is lifted. Can't risk getting banned again!
[gallery-dl][error] No suitable extractor found for 'https://www.instagram.com/stories/et2k/2457611747557737659/'
latest dev 1.16.0-dev
@xibr https://github.com/mikf/gallery-dl/issues/1149#issuecomment-738001532 says
The rewrite is still lacking support for stories, and post listings other than the regular one (e.g.
instagram.com/instagram
) might not work as before, but at least it won't get you banned anymore.
Now it works well with stories. Thanks
A question: When trying to download Instagram story All stories download, not a single story. Is this expected?
Well, is it possible to download a part of images and save the position to continue from it on the next launch? For example, I have downloaded 1000 of 2000. Is it possible to continue from 1001 on the next launch? Currently the program performs requests for the first 1000 of images that were downloaded. Requests are performed one by one without pauses for the downloading that leads to the login page (the recheck of 1000 posts requires to perform 84 requests for a short time).
@mikf The fullname
filename field returns None
on 1.16 for all users as far as I can tell.
@TestPolygon
is it possible to download a part of images and save the position to continue from it on the next launch?
You couldn't with old extractor and I don't think you can with the new one but I haven't checked that yet.
Try that yourself, you are looking for options -v --range 1000-
and -v --download-archive history.sqlite
.
SQLite DB stores only node IDs, so it can be used only to check (if --range
exists) the node with certain ID was downloaded or not. By default it checks the location where files would be downloaded and compares the expected filename with names of files are in this directory.
--verbose
was useful to debug. I can say that it is possible to do.
It requires to add, for example --session
flag.
With this flag the program should store (in a system file) the current parameters that are required for requesting the next "list page" with accociated url. For example: [{ur1: [param1, param2]}, {ur2: [param1, param2]}]
. And use them if they are presented in this file to continue downloading from a certain possition. (If a user has interrupted the downloding via Ctrl+C (for this case it needed to store the params for requesting the current "list page" too), or he was faced with API limit exceed ("login page") when he has requested the next "list page".
A more complicated format example:
[{ url1: { current: { params: [], fullyDownloaded: false }, next: { params: [] }, date: 1607609281 } }]
For instagram it are: tracking_token
, query_hash
and id
.
@mikf ?
Hi @mikf,
I tried the latest 1.16.0-dev version, and I would get some successful downloads in the beginning, and after a minute or so everything returns a warning, please check the TXT file I've attached:
@mikaljan This output isn't from the latest dev version. The Unable to fetch data from ...
logging message was removed in the rewrite (https://github.com/mikf/gallery-dl/commit/447488fb1876f507a6409be12864e9f22cac83e6). Check gallery-dl --version
to make sure you are actually using 1.16.0-dev
.
I'll release a new version with the fix this weekend. You could just wait until then.
@TestPolygon https://github.com/mikf/gallery-dl/commit/b88c97b873fe3bd07ffe6651801e121a20938cb1 adds a way to at least manually input a cursor
value and continue downloading from the current position. The cursor
tokens get outputted as debug logging messages or when getting redirected to the login page.
This commit also increases the amount of requested posts per GraphQL from 12 to 50 (the maximum possible). Since the redirect to login page for not logged in users always happens after ~120 requests regardless of how many posts get fetched or how long of a wait time there is in between, this should allow for more posts to get downloaded.
Hm, I used pip install --no-cache-dir --upgrade https://github.com/mikf/gallery-dl/archive/master.tar.gz
, but I still have the old behavior ("first":+12
and no promt "Use '-o cursor=%s' to continue downloading "
on the login page event)
Upd: use pip unistall gallery-dl
I was experiencing this error previously as well, but after upgrading to 1.16.0, I've yet to encounter it (working across several 2k+ mixed albums).
As omnicr0n said, v1.16.0 is out, which should at least somewhat mitigate any rate limit problems with Instagram.
@xibr this is expected and worked like that even before the rewrite. If you want to limit the download to only a specific story ID, use --filter "media_id == 'STORY ID'"
@rivke41levp656 Instagram removed those from all owner
fields, it seems. This has nothing directly to do with the rewrite from https://github.com/mikf/gallery-dl/commit/447488fb1876f507a6409be12864e9f22cac83e6. The fullname
info was still available a month ago, but now the embedded data in user profile pages like https://www.instagram.com/instagram/ only has
"owner":{"id":"25025320","username":"instagram"}
@TestPolygon
$ pip install -U -I --no-deps --no-cache-dir https://github.com/mikf/gallery-dl/archive/master.tar.gz
should work without needing to uninstall. (I've updated the instructions in the README accordingly)
@xibr this is expected and worked like that even before the rewrite. If you want to limit the download to only a specific story ID, use
--filter "media_id == 'STORY ID'"
got it, thanks.
So, instagram works, again, yeah! (at least on public follows).
Unfortunately it doesn't work for private accounts (that my account has access to), even having provided instagram with my username/password in the conf file... and I'm fairly sure I did it right because, well, it used to work just fine.
Does it work if you remove username/password authentication and try it with the exported cookies instead?
Forcing a re-login by clearing your cache with gallery-dl --clear-cache
and then trying to download from Instagram again might also work.
gallery-dl stop working on instagram today, i'm getting the following error:
E:\gallery-dl>gallery-dl https://www.instagram.com/migichen_/ [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CIP9dLAhkn3/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CIN3Hhwhtne/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CIDVKJshBuM/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CH-cjDIh0Tz/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CH4-mdcBlAP/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CH2itYohHD8/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CH0I8u5BWVQ/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) [instagram][warning] Unable to fetch data from 'https://www.instagram.com/p/CHxLcqxBqfe/': JSONDecodeError: Expecting value: line 1 column 1 (char 0) . . . .