suryakencana007 / comic-vine-scraper

Automatically exported from code.google.com/p/comic-vine-scraper
0 stars 0 forks source link

Delay all not-found issues to the end of the scraping, not interrupting the batch #161

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When dealing with large batches of issues already scraped (and a few black 
sheep non-scraped in the middle), i find it annoying that i leave the operation 
running overnight to come back the next day and find it stuck in a issue asking 
for my intervention to continue.

Moving all the issues that require user intervention to the end of the 
operation would allow all the others that it can scrape without user 
intervention to happen successfully.

Btw, latest version is working great.

Original issue reported on code.google.com by joaomigu...@gmail.com on 6 Jan 2011 at 5:37

GoogleCodeExporter commented 9 years ago
This is a good idea, I will implement it when I get a chance.

Glad the new version is working well!

Original comment by cban...@gmail.com on 6 Jan 2011 at 8:03

GoogleCodeExporter commented 9 years ago
This has been implemented for 1.0.34.

Original comment by cban...@gmail.com on 7 Jan 2011 at 4:20

GoogleCodeExporter commented 9 years ago
Re-opening.

The original implementation of this enhancement request only sorts comics that 
have never been scraped before to the end of the scrape operation.   But comics 
that require manual intervention can also occur when a Comic's previously 
scraped ID number in the ComicVine database has changed.  This doesn't happen 
very often, but if we're sorting all of the "manual intervention" comics to the 
end of the scrape operation, then we should probably try to deal with this case 
in the same fashion, too.

Original comment by cban...@gmail.com on 9 Jan 2011 at 8:30

GoogleCodeExporter commented 9 years ago
I've made additional changes in 1.0.37 that will deal with the case where a 
rescrape fails because ComicVine's ID for that Comic Book has been changed.

In such situations, the failing comic will be shuffled to the end of the queue, 
allowing the other comics (which may not fail) to continue to be processed 
automatically.

The end result:  if you start a large re-scrape operation and go away for a 
long time, when you come back, any comics that could be processed automatically 
will have been.  Only the comics that *require* human interaction will still be 
unscraped.

Original comment by cban...@gmail.com on 12 Jan 2011 at 3:06