Closed rhiever closed 11 years ago
When retrieving listings, PRAW fetches up to 100 items at a time. Here's an example showing 5 submissions:
http://www.reddit.com/r/redditdev/new.json?limit=5
From this, PRAW already fetches as many submissions as it can at once. The problem is there is no comment data here. To get the comment data, one must fetch the individual submission page. Here's an example with only 5 comments shown:
http://www.reddit.com/r/redditdev/comments/17oer0/_.json?limit=5
PRAW implicitly tries to grab as many comments at once per request, however, there is still a reddit limit, thus why the replace_more_comments
call is used.
Given the API restrictions, to discover and fetch 1000 submissions will take a minimum of 10 listing requests (100 submissions each) and 1000 submission requests or at least 33 minutes 40 seconds (1 request every 2 seconds). For large submissions PRAW will make up to 32 comment replacement requests thus in the worst possible case it could take just under 18 hours to process 1000 submissions.
We'll work with what we've got then. :+1:
Is there a PRAW method to request all submissions from reddit in a single request? Similar to what https://github.com/rhiever/reddit-analysis/blob/1af786b71ea7d6cf75ec98ee850581b90bf9503c/word_freqs.py#L147 seems to do for comments.