montera34 / pageonex

PageOneX. Analyzing front pages
http://pageonex.com
GNU Affero General Public License v3.0
54 stars 13 forks source link

download all images daily #140

Closed rahulbot closed 11 years ago

rahulbot commented 11 years ago

We should be proactively downloading all the images locally each day, so that when a new thread is created it is much faster.

rahulbot commented 11 years ago

Ed did this a few days ago. Verified by counting the images in the database from yesterday (Image.where(:publication_date=>"2013-05-12"))

numeroteca commented 11 years ago

That's great. However there should be a way to re-scrape days that were not downloaded properly, like in this thread: http://production.pageonex.com/numeroteca/movilizaciones-mayo-2013

The problem is consistent in recently created threads: http://production.pageonex.com/numeroteca/movilizaciones-mayo-2013-b May 1st and 12th still misssing.

rahulbot commented 11 years ago

I added a button that shows up while coding if you are looking at a 404 missing image. You click the button and it tries to download that image again. I tested this on dev and it looks to be working. The thread owner or admin is allowed to do this.

numeroteca commented 11 years ago

it also worked fine for me in dev.

On Tuesday, May 14, 2013, rahulbot notifications@github.com wrote:

I added a button that shows up while coding if you are looking at a 404 missing image. You click the button and it tries to download that image again. I tested this on dev and it looks to be working. The thread owner or admin is allowed to do this.

— Reply to this email directly or view it on GitHub.

rahulbot commented 11 years ago

Another note - you can now rescrape all the images for a thread on the production server from the console like this

$ rails console production
1.9.2p290 :001 > Threadx.find_by_thread_name([slug]).scrape_all_images true