mikf / gallery-dl

Command-line program to download image galleries and collections from several image hosting sites
GNU General Public License v2.0
12.09k stars 983 forks source link

Use pushshift API or old.reddit.com html to get url of deleted reddit posts #3386

Open gdl-ps opened 1 year ago

gdl-ps commented 1 year ago

Deleted posts on reddit have an empty string url and so it fails to download, but still returns the JSON. Check out pushshift.io.

They crawl reddit have a huge dataset of comments and posts and an API that gives a JSON response with the url still there even if the original has been removed.

For example, see this request (warning, very NSFW):

https://api.pushshift.io/reddit/submission/search?ids=xziujh vs. https://www.reddit.com/r/FemBoys/comments/xziujh/pinky_out_we_fancy.json

Also, old.reddit.com still has the link in the html. You could even use the same API to get the rest of the JSON just fill in the blank url from pushshift or old reddit html. You will need a 'Cookie: over18=1' header for old.reddit.com html.

Originally posted by @gdl-ps in https://github.com/mikf/gallery-dl/issues/2889#issuecomment-1344937805

mikf commented 1 year ago

671