mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
11.34k stars 923 forks source link

Reddit Stopped working #4292

Open mo2theflow opened 1 year ago

mo2theflow commented 1 year ago

There is a serious issue now with Reddit where it forces Gallary-dl to wait 20 minutes before each download, I was able to go around this using my own python code. But I thought I'd let you know as gallery-dl is super awesome and this will be needed by users who won't be able to do this themselves. Cheers

mikf commented 1 year ago

Yeah, with the recent ratelimit changes it is no longer feasible to use the default API credentials. Every user will have to get his own client-id: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorredditclient-id--user-agent

RCcola1987 commented 1 year ago

Does this mean that the refresh token no longer works and this is the only method to use on reddit? Also s their any way to use the json output of the page instead to avoid reddits new API BS?

mo2theflow commented 1 year ago

Yeah, with the recent ratelimit changes it is no longer feasible to use the default API credentials. Every user will have to get his own client-id: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorredditclient-id--user-agent

Hi, thank you, how would I use the client-id with the standalone windows executable?

mo2theflow commented 1 year ago

Does this mean that the refresh token no longer works and this is the only method to use on reddit? Also s their any way to use the json output of the page instead to avoid reddits new API BS?

That's what I'm doing right now, then i use curl to download. The only thing I'm able to get are images right now, but today i'll get started on gallery posts and continue on.

rautamiekka commented 1 year ago

Yeah, with the recent ratelimit changes it is no longer feasible to use the default API credentials. Every user will have to get his own client-id: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorredditclient-id--user-agent

Hi, thank you, how would I use the client-id with the standalone windows executable?

No platform diffs, you just put it correctly into the config.

mo2theflow commented 1 year ago

Yeah, with the recent ratelimit changes it is no longer feasible to use the default API credentials. Every user will have to get his own client-id: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorredditclient-id--user-agent

Hi, thank you, how would I use the client-id with the standalone windows executable?

No platform diffs, you just put it correctly into the config.

Sorry to sound stupid, but I don't quiet follow. I use the gallery-dl.exe downloaded from here, if I want to put it in the config, how would I do that? For example this is how I run the command, what do I change?: gallery-dl.exe -D . htts://reddit.url.here

glottisfaun0000 commented 1 year ago

Yeah, with the recent ratelimit changes it is no longer feasible to use the default API credentials. Every user will have to get his own client-id: https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorredditclient-id--user-agent

after setting this up in my config, i'm still getting Waiting until x for rate limit reset with no config error. is that expected? any way to check if it's using my credentials for the api? i'm not seeing a requesting public access token message.

MrPromotor commented 1 year ago

it worked for me when i runed gallery-dl oauth:reddit, then the browser opened and I accepted the gallery-dl app then generated me a private token

glottisfaun0000 commented 1 year ago

@MrPromotor i'm getting the same rate limit wait after this, just with the private access token

[reddit][info] Refreshing private access token [reddit][info] Waiting until 00:10:00 for rate limit reset.

so does it make any difference?

mo2theflow commented 1 year ago

it worked for me when i runed gallery-dl oauth:reddit, then the browser opened and I accepted the gallery-dl app then generated me a private token

How do you edit the config? I'm using the executable; gallery-dl.exe ... Cheers

Hrxn commented 1 year ago

You edit the config with an editor.

mo2theflow commented 1 year ago

You edit the config with an editor.

I literally just have the gallery-dl.exe , which config file are you referring to?

mikf commented 1 year ago

which config file are you referring to?

The config file you'd have to create yourself or by running gallery-dl --config-create

Afterwards you put the following into the extractor section.

        "reddit": {
            "client-id": "YOURCLIENTID",
            "user-agent": "Python:APPLICATIONNAME:v1.0 (by /u/YOURUSERNAME)"
        }

and I accepted the gallery-dl app

That's not what you should do. gallery-dl is rate limited, you need your own application/client-id.

MrPromotor commented 1 year ago

Owner

even like this only allows 400+ posts. i have tried several times download a Subreddit, but it doesn't download all

MrPromotor commented 1 year ago

You edit the config with an editor.

I literally just have the gallery-dl.exe , which config file are you referring to?

When run as executable, gallery-dl will also look for a gallery-dl.conf file in the same directory as said executable.

michealespinola commented 1 year ago

I believe that I am using the config file reddit extractor properly, and am no longer seeing the '[reddit][info] Requesting public access token' message. But, is it normal to still be hit by a 5-minute rate limit reset when I have only tried to download a single image? Example status:

[reddit][info] Waiting until 10:00:01 for rate limit reset.
Hrxn commented 1 year ago

Yeah you have to use your own API tokens now

michealespinola commented 1 year ago

Right, and I'm using that per mikf's above comment with (2) additional fields to use in the reddit extractor. I'm also familiar with using PRAW, so this doesn't seem to be enough for a properly authenticated Reddit API connection.

I have it configured per the docs, and I still seem to be seeing that 5-minute rate limit timer

exegg commented 1 year ago

Yup, created everything on the reddit website as well as editing my config, still being hit with the rate limit reset timer every 10 minutes. Can download an average of 80 files.

mo2theflow commented 1 year ago

which config file are you referring to?

The config file you'd have to create yourself or by running gallery-dl --config-create

Afterwards you put the following into the extractor section.

        "reddit": {
            "client-id": "YOURCLIENTID",
            "user-agent": "Python:APPLICATIONNAME:v1.0 (by /u/YOURUSERNAME)"
        }

and I accepted the gallery-dl app

That's not what you should do. gallery-dl is rate limited, you need your own application/client-id.

Thank you this worked!

slowthgt commented 1 year ago

which config file are you referring to?

The config file you'd have to create yourself or by running gallery-dl --config-create

Afterwards you put the following into the extractor section.

        "reddit": {
            "client-id": "YOURCLIENTID",
            "user-agent": "Python:APPLICATIONNAME:v1.0 (by /u/YOURUSERNAME)"
        }

and I accepted the gallery-dl app

That's not what you should do. gallery-dl is rate limited, you need your own application/client-id.

This seems to not work for when downloading your own profile saved posts

[gallery-dl][debug] Python 3.8.10 - Windows-10-10.0.19045
[gallery-dl][debug] requests 2.28.2 - urllib3 1.26.15
[gallery-dl][debug] Configuration Files ['D:\\gallery-dl\\gallery-dl.conf']
[gallery-dl][debug] Starting DownloadJob for 'https://www.reddit.com/user/[redacted]/saved/'
[reddit][debug] Using RedditUserExtractor for 'https://www.reddit.com/user/[redacted]/saved/'
[reddit][info] Requesting public access token
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.reddit.com:443
[urllib3.connectionpool][debug] https://www.reddit.com:443 "POST /api/v1/access_token HTTP/1.1" 200 678
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): oauth.reddit.com:443
[urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /user/[redacted]/saved/.json?limit=100&raw_json=1 HTTP/1.1" 403 38
[reddit][error] AuthorizationError: Insufficient privileges to access the specified resource

It could be due to it using a public oauth when using your own client? https://github.com/mikf/gallery-dl/blob/fb3d1462b11501f2a6ebabbe6604ac0006c02a07/gallery_dl/extractor/reddit.py#L442-L450

mikf commented 1 year ago

This seems to not work for when downloading your own profile saved posts

You need to get a refresh token for your account with your custom client-id. Run gallery-dl oauth:reddit and make sure it says "[oauth][info] Using custom reddit client ID (YOURCLIENTID)" at the top.

michealespinola commented 1 year ago

Thank you, @mikf. That's exactly what I needed.

slowthgt commented 1 year ago

This seems to not work for when downloading your own profile saved posts

You need to get a refresh token for your account with your custom client-id. Run gallery-dl oauth:reddit and make sure it says "[oauth][info] Using custom reddit client ID (YOURCLIENTID)" at the top.

This worked, thanks. Perhaps mention the oauth step on the client-id description? Would save you answering the same question, probably. https://github.com/mikf/gallery-dl/blob/fb3d1462b11501f2a6ebabbe6604ac0006c02a07/docs/configuration.rst?plain=1#L5099-L5116

glottisfaun0000 commented 1 year ago

Just to clarify, is there currently an authentication configuration that circumvents the 10 minute rate limit resets? Like others, I'm still getting that even with oauth/client id.

mikf commented 1 year ago

I pretty much gave reddit the same treatment as deviantart in that it now


@glottisfaun0000 try again. It might have just been due to a cached access token from the default client id. Otherwise, make sure it is actually loading your config file and using your custom client-id.

glottisfaun0000 commented 1 year ago

@mikf You were right, it's working now. Thanks. One more thing to clarify, do I need to set a sleep-request manually to avoid what @ceatpm mentions? I'm not seeing any 403s. In other words for the repository's sake, should reddit have a sleep-request in default example config now?

Make sure to also use one of the sleep options under Downloader Options: in gallery-dl --help because your account's ability to use apps might get banned if you don't abide by the 100 requests/minute rule (I think 100, I've seen 60 as well). I let it run thinking that Reddit would automatically limit me like it usually does but it didn't and now I get a 403 status code. I believe that it tracks the ip of the banned app's account because I tried again with a new app under a different account and it immediately got the 403 status code. To clarify, neither account is banned but at one point the apps page for the first account wouldn't work and showed a sign-in page instead. After some hours, all of the apps appeared again but still only get 403.

exegg commented 1 year ago

Well finally got it to work with all the new tweaks. Also put the sleep request at 3 seconds between every download just for cautionary measure.

wankio commented 1 year ago

hmm i get this error after setup client-id and use oauth:reddit

reddit: Requesting public access token
urllib3.connectionpool: Starting new HTTPS connection (1): www.reddit.com:443
urllib3.connectionpool: https://www.reddit.com:443 "POST /api/v1/access_token HTTP/1.1" 401 41
reddit: Server response: {'message': 'Unauthorized', 'error': 401}
reddit: AuthenticationError: "401: Unauthorized"

btw, can i set gallery-dl keep download https://external-preview.redd.it instead if imgur 404 ?

Thank

mikf commented 1 year ago

@wankio That's the error you get when using an invalid client-id. Maybe you included some whitespace character(s) at the start or end, or only copy-pasted a partial client id.

wankio commented 1 year ago

@mikf thank for reply but i'm pretty sure i check client-id many times before put it into config file, even created 2 more apps to test and result still same it show my app name when i use gallery-dl oauth:reddit so i'm pretty sure my client-id is not problem

use VPN WARP, not working use VPN Warp, create new reddit, create new app, not working

add user-agent in extractor or reddit, not working, if i want to get refresh token, i need to run oauth:reddit without config file

"extractor":
    {
         "user-agent": "Python:gallerydllol:v1.0 (by /u/xxxxxxxxxx)"
}
garoto commented 1 year ago

Make sure you have these three lines under the reddit extractor section of your conf:

        "reddit": {
            "client-id": "mRzsce27P2RlUj8ABkKfiO",
            "user-agent": "Python:myownapp:1.0 (by /u/my_fancy_reddit_username)",
            "refresh-token": "26655823-1q1WY-NqSjJW3CxGA-15sxVe9-f9RO",
           }

How to get the client-id and how to construct the user-agent string is described my mikf above in this issue. The refresh-token is generated once one clears the cache (gallery-dl --clear-cache reddit) and then run gallery-dl oauth:reddit.

Without all of these three lines, it eitheir won't work at all or will be throttled.

mikf commented 1 year ago

@wankio Then I don't know how to get it working for you, sorry. The only way to get a 401: Unauthorized error on my machine was to use an invalid client-id. Everything else either gave me a different error or it did not cause an error in the first place.

gitgruvee commented 1 year ago

Hi guys. So I did (most of) the stuff suggested, along with clearing cache and getting the oauth done again. Didn't work.

HOWEVER. If you let gallery-dl run even if it says "21hour for rate limit reset", it will dl. Unfortunately, it will take a much longer while than in the past. This is not a solution, but just to note that sometimes it might wiggle its way. So maybe just let it sit and download what you desire. I know, it sucks. But it's something at least for the time being.

Using the config file is something I need to try, but I recall having issues in the past with locating just which config file is used. Once had some 3 different areas in C:\users\etc... and even putting info in all three still ended up to the same result.

mikf commented 1 year ago

Using the config file is something I need to try, but I recall having issues in the past with locating just which config file is used. Once had some 3 different areas in C:\users\etc... and even putting info in all three still ended up to the same result.

Run gallery-dl -v and check which configuration file(s) it loads:

$ gallery-dl -v
...
[gallery-dl][debug] Configuration Files ['${HOME}/.gallery-dl.conf']

To make sure you have a config file at one of the supported locations, run gallery-dl --config-create. You can then paste the necessary values (https://github.com/mikf/gallery-dl/issues/4292#issuecomment-1637214632) into the extractor section of the generated file.

You can then use -v again together with a Reddit URL to check which client-id gets used:

$ gallery-dl -v reddit.com/r/asd
...
[reddit][debug] Using custom API credentials (client-id NsuSQ*********)
wankio commented 1 year ago

Make sure you have these three lines under the reddit extractor section of your conf:

        "reddit": {
            "client-id": "mRzsce27P2RlUj8ABkKfiO",
            "user-agent": "Python:myownapp:1.0 (by /u/my_fancy_reddit_username)",
            "refresh-token": "26655823-1q1WY-NqSjJW3CxGA-15sxVe9-f9RO",
           }

How to get the client-id and how to construct the user-agent string is described my mikf above in this issue. The refresh-token is generated once one clears the cache (gallery-dl --clear-cache reddit) and then run gallery-dl oauth:reddit.

Without all of these three lines, it eitheir won't work at all or will be throttled.

UPDATE: yeah i misreading this You must pick "installed apps" when creating apps instead web app and script code Now it working

jimmy-1000 commented 1 year ago

I believe that I am using the config file reddit extractor properly, and am no longer seeing the '[reddit][info] Requesting public access token' message. But, is it normal to still be hit by a 5-minute rate limit reset when I have only tried to download a single image? Example status:

[reddit][info] Waiting until 10:00:01 for rate limit reset.

Sadly last version v1.25.8 doesn't solve this.

Altrawup commented 1 year ago

Do you just have to fill out the "create app" form, or also the API access form they mention that wants you to put in all kinds of details?

Hrxn commented 1 year ago

The app form should be enough..

john-peterson commented 1 year ago

Owner

even like this only allows 400+ posts. i have tried several times download a Subreddit, but it doesn't download all

I saw this also any workaround

I am using

gallery-dl -o path-restrict=windows -o extractor.reddit.client-id=abc --sleep 5 -D . reddit.com/r/way-more-than-400-post

I get four hundred or something then it just quits to the terminal like its job is done

scarlion1 commented 5 months ago

Could anyone familiar with the Reddit history post an update/guide on how to use gallery-dl with it?  Thanks

F-you @kattjevfel 😇 None of the information here is coherent, the link to reddit config is broken and unfortunately not all of us have hours to spend wading through everything to put the correct pieces back together.