joshua-hull / Reddit-Image-Scraper

Perl script to download imaged hosted at imgur.com linked from a subreddit at reddit.com
25 stars 8 forks source link

improve the JSON scrape #12

Closed aggrolite closed 11 years ago

aggrolite commented 11 years ago

Hey Joshua,

I've been thinking about your script the past few days and wanted to raise an issue with our current calls to the JSON:

Problem:

Reddit's JSON restricts the limit parameter to a value <=100. This means that our current value of 1,000 is pointless and if you make this change to the code on line 40:

warn scalar @$posts;

You will notice that the amount of posts we get never exceeds 100. For example, try running the script on r/pics. You might get more than 100 pictures downloaded because some of those links are albums, but it will be no where near 1,000 because we are only getting the first 100 posts

Solution:

  1. I think we should allow the user to specify how many images to download.
  2. We need to be able to traverse the next pages available.
    • Reddit offers an after parameter that I think is the best way to do this. So, to get another page of results you take the name key of last post's JSON, and make another call with pics.json?limit=100&after=name_goes_here ...so we can just keep using the after parameter to continue getting results until we have met the user-defined limit of images
    • example: look at http://www.reddit.com/r/vim.json, take the first post's name, and goto http://www.reddit.com/r/vim.json?after=t3_16xbkx ...notice the second link starts at the second result now

what do you think?

see http://www.reddit.com/dev/api for details on their API

joshua-hull commented 11 years ago

Ya, that '1000' was kind of pointless. Thanks for the heads up on Getopt/Long. I agree that traversing the after=... would be the best option but I never got around to doing it. I'll see about doing it here soon but I'm currently in school so it may sit on the back burner for a while.

aggrolite commented 11 years ago

@joshua-hull no worries, I was hacking on some code today during my break and should be able to give you a pull request for review within the next few days.