pushshift / api

Pushshift API
1.29k stars 107 forks source link

Clarification on caching pattern #5

Open ghost opened 6 years ago

ghost commented 6 years ago

Apr 2017:

They are rechecked 30 minutes later, 4 hours later and then one day later to keep the stats up to date.

http://reddit.com/r/datasets/comments/64cuw4


Jan 2017:

I have a script that constantly gets all new submissions from the /api/info endpoint at four checkmark periods -- newest, 2 hours old, one day and three days -- meaning it is constantly getting new submissions and rechecking them when they are 2 hours, 24 hours and 72 hours old.

http://reddit.com/r/datasets/comments/5nxkob/-/dcyvvqq

pushshift commented 6 years ago

That should be correct. That is for submissions only though. Are you noticing something different?

ghost commented 6 years ago

your answer doesnt make sense because ive given 2 options

pushshift commented 6 years ago

Sorry, my apologies -- That's what happens when you try multitasking too many things at once. The rechecks are adaptive now due to the number of comments that are posted to Reddit. Since I can only grab 100 objects every second, I take a 60 second average for comments to calculate how many comments I'll need to request for the next second and then whatever is left-over is used to update submissions.

That being the case, you can use the "retrieved_on" parameter which is one that I add to mark when I last got that object and compare it to the "created_utc" parameter to see when it was refreshed.