shadowmoose / RedditDownloader

Scrapes Reddit to download media of your choice.
1.12k stars 101 forks source link

All Time wont really download everything #103

Open floriplum opened 4 years ago

floriplum commented 4 years ago

Describe the bug

When i select All Time for my Source it wont Download every Post from Reddit. However if i create a source for every time available(all time, year, month, week and day) it will download everything.

Environment Info

Screenshots/Information

It looked like it downloaded everything until 3 Months ago without changing the time to month, week and day.

Additional context

Tried without a limit using a pushshift source.

shadowmoose commented 4 years ago

Hey, thanks for the report. just to confirm, this is a bug with a PushShift source?

floriplum commented 4 years ago

So i tested it a bit and it looks like only the Pushshift Source has the problem, but while testing i noticed that using a Reddit Source the download wont finish after probably downloading everything. It just stayed there for ~60 Minutes without doing anything.

But i should probably open another issue for the other problem.

Greetings, Flori

KianMaghsoodi commented 4 years ago

Hey shadowmoose, I really appreciate your sincere efforts, btw, I have the same issue, I wanted to check if there any update on this issue of any tweaks in the settings to make it download "almost" everything?

KianMaghsoodi commented 4 years ago

I think I have figure something out, in the Pushshift method if you set the minimum created_utc to something really old like 2010 or something (obviously in unix epoch) then it will "almost" download everything after that date , you can use [ https://api.pushshift.io/reddit/submission/search/?subreddit=NAME&aggs=author&size=0 ] and fill the NAME with the subreddit name that you want and it will yield each submissions made by each user, so you will know how many posts are there and can check with the number of scanned posts in RMD using the method I mentioned above. Cheers!