rhiever / reddit-analysis

A Python script that parses post titles, self-texts, and comments on reddit and makes word clouds out of the word frequencies.
286 stars 63 forks source link

Method to get all submissions in a single request? #22

Closed rhiever closed 11 years ago

rhiever commented 11 years ago

Is there a PRAW method to request all submissions from reddit in a single request? Similar to what https://github.com/rhiever/reddit-analysis/blob/1af786b71ea7d6cf75ec98ee850581b90bf9503c/word_freqs.py#L147 seems to do for comments.

bboe commented 11 years ago

When retrieving listings, PRAW fetches up to 100 items at a time. Here's an example showing 5 submissions:

http://www.reddit.com/r/redditdev/new.json?limit=5

From this, PRAW already fetches as many submissions as it can at once. The problem is there is no comment data here. To get the comment data, one must fetch the individual submission page. Here's an example with only 5 comments shown:

http://www.reddit.com/r/redditdev/comments/17oer0/_.json?limit=5

PRAW implicitly tries to grab as many comments at once per request, however, there is still a reddit limit, thus why the replace_more_comments call is used.

Given the API restrictions, to discover and fetch 1000 submissions will take a minimum of 10 listing requests (100 submissions each) and 1000 submission requests or at least 33 minutes 40 seconds (1 request every 2 seconds). For large submissions PRAW will make up to 32 comment replacement requests thus in the worst possible case it could take just under 18 hours to process 1000 submissions.

rhiever commented 11 years ago

We'll work with what we've got then. :+1: