Closed numeroteca closed 10 years ago
This process fixes it going forward:
rake db:migrate
rake scraping:migrate_media_folders_to_include_country_codes
However, I don't think I can fix it for previously downloaded images where more than one media source has the same name, because I don't know which media source the image is from :-(
If we want to fix this thoroughly, this might work:
Threadx.find_by_thread_name([slug]).scrape_all_images true
)Updated on dev and production - the new folder names that include country-code seem to be working.
I've only seen this problem with the elmundo newspapers mentioned above (another user reported the same today). I need the Spanish Elmundo.
I am testing with this thread http://pageonex.com/numeroteca/quien-escribe-las-noticias/
I tried to run the code to fix it in the production console (RAILS_ENV="production" rails console):
Threadx.find_by_thread_name('quien-escribe-las-noticias').scrape_all_images true
but I get many errors like:
...
Image Load (57.0ms) SELECT
images.* FROM
imagesWHERE
images.
media_id= 146 AND
images.
publication_date= '2013-07-08' ORDER BY publication_date ASC, media_id ASC LIMIT 1 Media Load (0.3ms) SELECT
media.* FROM
mediaWHERE
media.
working= 1 AND
media.
id` = 146 LIMIT 1
Image Download Failed:57447: couldn't find image at http://img.kiosko.net/2013/07/08/es/elmundo.750.jpg (Permission denied - app/assets/images/kiosko/es-elmundo/elmundo-2013-07-08.jpg)
...
(0.1ms) BEGIN
(0.1ms) COMMIT
Image Load (49.9ms) SELECT images
.* FROM images
WHERE images
.media_id
= 490 AND images
.publication_date
= '2013-07-08' ORDER BY publication_date ASC, media_id ASC LIMIT 1
Media Load (0.2ms) SELECT media
.* FROM media
WHERE media
.working
= 1 AND media
.id
= 490 LIMIT 1
Image Download Failed:57790: couldn't find image at http://img.kiosko.net/2013/07/08/es/elpais.750.jpg (Permission denied - app/assets/images/kiosko/es-elpais/elpais-2013-07-08.jpg)
(0.1ms) BEGIN
(0.1ms) COMMIT
Image Load (47.6ms) SELECT images
.* FROM images
WHERE images
.media_id
= 490 AND images
.publication_date
= '2013-07-09' ORDER BY publication_date ASC, media_id ASC LIMIT 1
Media Load (0.2ms) SELECT media
.* FROM media
WHERE media
.working
= 1 AND media
.id
= 490 LIMIT 1
Image Download Failed:58024: couldn't find image at http://img.kiosko.net/2013/07/09/es/elpais.750.jpg (Permission denied - app/assets/images/kiosko/es-elpais/elpais-2013-07-09.jpg)
(0.1ms) BEGIN`
Is it just a problem of permissions?
Besides, now some thumbnails in the composite are missing and the bars above those days are missing.
I created a thread with the two Elmundo newspapers http://pageonex.com/numeroteca/el-mundo-test/ and I saw a lot of different errors:
I see that there is no folder created in app/assets/images for the El Salvador newspaper: sv-elmundo, which might be part of the problem!
[I added you as collaborators in the thread.]
I fixes the image storage issues. I think the image mp and coding-carousel problems are also tied to the assumption of unique media names.
I tried those two threads and are working now. Let me know if you run into any other weirdness... this (incorrect) assumption of unique media names clearly has a lot of places we need to fix.
1st weirdness: When I draw an area in the Spanish El mundo, the same area is drawn in the El Salvador one for the same day(and viceversa) in the coding view. Then the areas only are displayed in the Spanish newspaper in the display view. For case: http://pageonex.com/numeroteca/el-mundo-test/
Closed by f3fc0915bb85b026ebdf46406031de0fafe062f0
I've found new conflict between the Argentinian and Paraguayan "La Nacion". I saw it in this series of threads by a user: the display view was working before, and now the La Nacion is not working any more http://pageonex.com/marielb/ley-de-voto-a-los-16-2/ or http://pageonex.com/marielb/ley-de-voto-a-los-16-1/
I created another thread to test with http://pageonex.com/numeroteca/test-repeated/ Argentinian nacion images are mixed with py nacion.
I think the "La Nacion" issue is a holdover from existing images that were fetched when the code wasn't smart about papers with the same name. To fix this, I rescraped the images for thread you mentioned - test-repeated:
Threadx.find_by_thread_name('test-repeated').scrape_all_images true
The other two (ley-de-voto-a-los-16-1, ley-de-voto-a-los-16-2), were related to the issues around thumbnails that exist but have size 0. I added some code (059392598b66ffb551f49aa37f323ff80f5bb0b4) to handle this better and now those two work (after I rescraped all the images).
Related to #168.
Found a bug while downloading new images in a thread http://pageonex.com/numeroteca/corrupcion-espana-julio-2013/: the wrong "elmundo" images were wrongly downloaded into the thread for days July 3-6 retroactively (days before I had the correct ElMundo, from Spain).
Here the two newspapers: El Salvador,sv,El Mundo,elmundo,http://www.elmundo.com.sv/ Spain,es,El Mundo,elmundo,http://www.elmundo.es/
So, I guess for the scraping also we should not assume that media.name is unique.