mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.88k stars 976 forks source link

Instagram refusing to log in on Ubuntu VPS #2427

Closed vaaski closed 1 year ago

vaaski commented 2 years ago

Hello, I've used this cli successfully on my local windows machine, but when I deployed it to my hetzner VPS running Ubuntu 20.04.2 LTS it will not log into instagram using -u and -p.

logs ``` $ ./gallery-dl --version 1.21.0 $ ./gallery-dl -v -u USERNAME -p PASSWORD https://www.instagram.com/p/CZNwbEXJWHC [gallery-dl][debug] Version 1.21.0 - Executable [gallery-dl][debug] Python 3.9.10 - Linux-5.4.0-65-generic-x86_64-with-glibc2.31 [gallery-dl][debug] requests 2.27.1 - urllib3 1.26.8 [gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/p/CZNwbEXJWHC/' [instagram][debug] Using InstagramPostExtractor for 'https://www.instagram.com/p/CZNwbEXJWHC/' [instagram][info] Logging in as USERNAME [urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /accounts/login/ HTTP/1.1" 200 121 [instagram][debug] Sleeping for 8.502 seconds [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /data/shared_data/ HTTP/1.1" 200 16267 [instagram][debug] Sleeping for 8.740 seconds [urllib3.connectionpool][debug] https://www.instagram.com:443 "POST /accounts/login/ajax/ HTTP/1.1" 429 386 [instagram][debug] '429 Too Many Requests' for 'https://www.instagram.com/accounts/login/ajax/' (1/5) [urllib3.connectionpool][debug] https://www.instagram.com:443 "POST /accounts/login/ajax/ HTTP/1.1" 429 386 [instagram][debug] '429 Too Many Requests' for 'https://www.instagram.com/accounts/login/ajax/' (2/5) [urllib3.connectionpool][debug] https://www.instagram.com:443 "POST /accounts/login/ajax/ HTTP/1.1" 429 386 [instagram][debug] '429 Too Many Requests' for 'https://www.instagram.com/accounts/login/ajax/' (3/5) [urllib3.connectionpool][debug] https://www.instagram.com:443 "POST /accounts/login/ajax/ HTTP/1.1" 429 386 [instagram][debug] '429 Too Many Requests' for 'https://www.instagram.com/accounts/login/ajax/' (4/5) [urllib3.connectionpool][debug] https://www.instagram.com:443 "POST /accounts/login/ajax/ HTTP/1.1" 429 386 [instagram][debug] '429 Too Many Requests' for 'https://www.instagram.com/accounts/login/ajax/' (5/5) [instagram][error] HttpError: '429 Too Many Requests' for 'https://www.instagram.com/accounts/login/ajax/' ```
vaaski commented 2 years ago

I've also tried it with cookies that I got from my local machine, also doesn't work. I also tried both ways on my other local windows machine, works fine.

I am guessing its some advanced bot detection, although I've never done any kind of interaction with instagram from my VPS' IP.

logs ``` ./gallery-dl -v --cookies cookies.txt https://www.instagram.com/p/CbEJbaSOEp2/ [gallery-dl][debug] Version 1.21.0 - Executable [gallery-dl][debug] Python 3.9.10 - Linux-5.4.0-65-generic-x86_64-with-glibc2.31 [gallery-dl][debug] requests 2.27.1 - urllib3 1.26.8 [gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/p/CbEJbaSOEp2/' [instagram][debug] Using InstagramPostExtractor for 'https://www.instagram.com/p/CbEJbaSOEp2/' [urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /graphql/query/?query_hash=2efa04f61586458cef44441f474eee7c&variables=%7B%22shortcode%22%3A+%22CbEJbaSOEp2%22%2C+%22child_comment_count%22%3A+3%2C+%22fetch_comment_count%22%3A+40%2C+%22parent_comment_count%22%3A+24%2C+%22has_threaded_comments%22%3A+true%7D HTTP/1.1" 403 75 [instagram][error] HttpError: '403 Forbidden' for 'https://www.instagram.com/graphql/query/' ```
vaaski commented 2 years ago

Maybe it'd help to use the exact cookies that gallery-dl generated on my local machine instead of the ones I got manually from firefox using the cookies.txt extension, but unfortunately I don't know where they're cached.

mikf commented 2 years ago

Using exported cookies like you did in https://github.com/mikf/gallery-dl/issues/2427#issuecomment-1072969310 should have been more than enough. I don't think you will be able to access instagram from your VPS if even that results in an error.

Just for debugging purposes, you could run that test from https://github.com/mikf/gallery-dl/issues/2427#issuecomment-1072969310 with --write-pages to dump instagram's response to a file and see what they are complaining about.

If you still want to try your luck with cookies cached by gallery-dl, they should be in %APPDATA%\gallery-dl\cache.sqlite3.

https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#cachefile (You need to pickle.load() any value from there, or you just copy the entire file over)

vaaski commented 2 years ago

I've just tried the --write-pages flag which resulted in this:

logs ``` ./gallery-dl -v --write-pages --cookies cookies.txt https://www.instagram.com/p/CbEJbaSOEp2/ [gallery-dl][debug] Version 1.21.0 - Executable [gallery-dl][debug] Python 3.9.10 - Linux-5.4.0-65-generic-x86_64-with-glibc2.31 [gallery-dl][debug] requests 2.27.1 - urllib3 1.26.8 [gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/p/CbEJbaSOEp2/' [instagram][debug] Using InstagramPostExtractor for 'https://www.instagram.com/p/CbEJbaSOEp2/' [urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /graphql/query/?query_hash=2efa04f61586458cef44441f474eee7c&variables=%7B%22shortcode%22%3A+%22CbEJbaSOEp2%22%2C+%22child_comment_count%22%3A+3%2C+%22fetch_comment_count%22%3A+40%2C+%22parent_comment_count%22%3A+24%2C+%22has_threaded_comments%22%3A+true%7D HTTP/1.1" 403 75 [instagram][error] HttpError: '403 Forbidden' for 'https://www.instagram.com/graphql/query/' cat 01_https_www.instagram.com_graphql_query_query_hash_2efa04f61586458cef44441f474eee7c_variables_%7B%22shortcode%22%3A+%22CbEJbaSOEp2%22%2C+%22child_comment_count%22%3A+3%2C+%22fetch_comment_count%22%3A+40%2C+%22parent_comment_count%22%3A+24%2C+%22has_.dump {"message":"Sorry, there was a problem with your request.","status":"fail"} ```

Doesn't seem particularly helpful to me unfortunately.

I've also tried copying the whole cache.sqlite3 file over, didn't work. Even tried generating a fresh one on my local windows machine and copying that over, also didn't work.

logs ``` ./gallery-dl -v -u USERNAME -p PASSWORD https://www.instagram.com/p/CbEJbaSOEp2/ [gallery-dl][debug] Version 1.21.0 - Executable [gallery-dl][debug] Python 3.9.10 - Linux-5.4.0-65-generic-x86_64-with-glibc2.31 [gallery-dl][debug] requests 2.27.1 - urllib3 1.26.8 [gallery-dl][debug] Starting DownloadJob for 'https://www.instagram.com/p/CbEJbaSOEp2/' [instagram][debug] Using InstagramPostExtractor for 'https://www.instagram.com/p/CbEJbaSOEp2/' [urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.instagram.com:443 [urllib3.connectionpool][debug] https://www.instagram.com:443 "GET /graphql/query/?query_hash=2efa04f61586458cef44441f474eee7c&variables=%7B%22shortcode%22%3A+%22CbEJbaSOEp2%22%2C+%22child_comment_count%22%3A+3%2C+%22fetch_comment_count%22%3A+40%2C+%22parent_comment_count%22%3A+24%2C+%22has_threaded_comments%22%3A+true%7D HTTP/1.1" 403 75 [instagram][error] HttpError: '403 Forbidden' for 'https://www.instagram.com/graphql/query/' ```

My guess is that they've never received "human" traffic from that IP and therefore trust it less than my Windows machine IP, from which I've used Instagram normally before. If you think that it could help, I could set up a proxy on my VPS and try logging in over its connection from a normal browser and see if it works after that.

One more thing that just came to mind: when I use username/password login without cache, I receive an email from Instagram saying I've logged into Firefox on Windows. Now when I use gallery-dl on my Ubuntu VPS, does it send Linux-looking user-agent headers to Instagram or always Windows to seem unsuspicious? If it uses whatever the system actually is, then Instagram could detect cookie mismatch by that. Not sure how gallery-dl does that though, just a thought.

Thanks for your help by the way, I appreciate it.

Vrihub commented 2 years ago

My guess is that they've never received "human" traffic from that IP and therefore trust it less than my Windows machine IP, from which I've used Instagram normally before.

I guess it's even worse: Instagram is known to dislike non-residential IP addresses (such those from VPSs in data centers etc), besides other kind of proxies (Tor etc). Depending on the type of request, they might force you to log in, or flag your activity as potentially malicious, requiring account verification, or even suspend/terminate your account. So I guess there is no way to circumvent that via software (cookies, user-agent, rate limiting etc).