voussoir / timesearch

The subreddit archiver
BSD 3-Clause "New" or "Revised" License
172 stars 7 forks source link

Getting deleted posts/comments? #3

Closed WAUthethird closed 5 years ago

WAUthethird commented 5 years ago

Hi, I'm wondering if there's a way to get deleted posts and/or comments in a subreddit via timesearch? pushshift.io archives them, but any database downloads using default settings still display removed comments/posts.

voussoir commented 5 years ago

Hi,

Because the timesearch and commentaugment tools first get their data from Pushshift, and then attempt to update them from reddit, you should get anything that pushshift has.

I just did this:

>timesearch commentaugment -s "bcu939"
Thank you Jason Baumgartner, owner of Pushshift.io!
New database .\subreddits\AskHistorians\AskHistorians.db
Apr 13 2019 21:33:51 - Apr 14 2019 02:03:22 +22

>timesearch offline_reading -r askhistorians
Building tree for t3_bcu939 (22 comments)
Wrote .\subreddits\AskHistorians\offline_reading\t3_bcu939.html

and I got this:

image

which shows deleted comments.

Note: I'm pretty sure AutoModerator is instantaneous because it's built into reddit, so if a post is removed by AM then probably pushshift won't have it either.

Beyond that, can you provide an example thread, or some timesearch commands you used that didn't get what you wanted?

WAUthethird commented 5 years ago

Ah, sorry, I didn't realize AutoModerator was instantaneous. Comparing a subreddit to my downloaded database shows both removed content in just Reddit and removed content in Reddit and the timesearch database. More likely the occasional removed post and comment are the result of AutoModerator, as you said. Thanks for the help!

voussoir commented 5 years ago

No problem. I'm always glad to give clarification, so don't hesitate to message again if you still have any suspicions that something isn't right.

pushshift commented 5 years ago

Thanks for the shout out. :) You are correct, Pushshift "usually" doesn't get things before automod deals with it but there are exceptions when it gets backed up.