joshua-hull / Reddit-Image-Scraper

Perl script to download imaged hosted at imgur.com linked from a subreddit at reddit.com
25 stars 8 forks source link

Download New Images #30

Closed thedead closed 10 years ago

thedead commented 10 years ago

Hello,

I downloaded all (I guess 1000 or so) picture off a sub, and after 2 weeks I tried to re-scrape the sub and I just got file already exists and nothing new downloaded.

Any ideas? Can I sort by new, and maybe even top?

aggrolite commented 10 years ago

Which subreddit were you downloading images from?

aggrolite commented 10 years ago

The script doesn't sort by top or new, but that feature should really be implemented.

aggrolite commented 10 years ago

I worked on some code last night to handle an option for sort. One thing I did not think of beforehand was how reddit can sort by today, this week, this month, etc. So we would need to possibly pass two options for sorting, but maybe there's a better way to support that.

I think the options could be improved. If we switch to Geopt::Long, we can support full-length options plus the existing one-letter options, like so:

use Getopt::Long qw(:config auto_abbrev);

GetOptions(
    "user"    => \my $opt_u,
    "query"   => \my $opt_q,
    "limit=i" => \my $opt_l,
    "sort=s"  => \my $opt_s,
) or die "Error in args?\n";

For example, if I was passing a limit value to the script, I could use either -l 50 or --limit 50

joshua-hull commented 10 years ago

Ya, I agree that allowing the user to pick the sort order would be a good option. We just need to make sure they pick one that is on Reddit.

thedead commented 10 years ago

I'm not the most technical, but maybe adding an option to start from xxx as well...

joshua-hull commented 10 years ago

As in start from post number xxx in the category (new/hot/etc)?

thedead commented 10 years ago

Exactly

aggrolite commented 10 years ago

That shouldn't be too hard to implement, but XXX can't be an integer like 1-100. It will have to be a unique ID so that we can find a post to start from. For example:

http://www.reddit.com/r/pics/comments/1x2x6j/speedskating_great_sport_or_greatest_sport/

^ the unique ID is 1x2x6j, or t3_1x2x6j which we can use to tell reddit's API where to start from in the JSON

joshua-hull commented 10 years ago

@aggrolite beat me to it but ya, it's much easier if we get the id. I wonder how many people will be using it though.

thedead commented 10 years ago

I think if it is properly documented it would be used...

aggrolite commented 10 years ago

I should have time to hack on some thing this week