suryakencana007 / comic-vine-scraper

Automatically exported from code.google.com/p/comic-vine-scraper
0 stars 0 forks source link

Script fails during series parsing #119

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
DESCRIBE THE PROBLEM:
Recently, I found that script crashes during series look ups.  This seems to 
only happen to fairly large series, like Action or Detective.

--------------------------------------------------------------------------------
Comic Vine Scraper, v1.0.25
Cache Directory:     C:\Program Files\ComicRack\Data\Scripts\Comic Vine 
Scraper\localCache/
Settings File:       C:\Program Files\ComicRack\Data\Scripts\Comic Vine 
Scraper\settings.dat
--------------------------------------------------------------------------------

--------------------------------------------------------------------
[ ] Series          [ ] Number          [X] Month           
[X] Title           [ ] Alt Series      [X] Writer          
[X] Penciller       [X] Inker           [X] Cover Art       
[X] Colorist        [X] Letterer        [X] Editor          
[X] Summary         [X] Notes           [X] Tags            
[ ] Imprint         [X] Year            [ ] Publisher       
[ ] Volume          [X] Characters      [X] Teams           
[X] Locations       [X] Webpage         
-------------------------------------------------------------------
[X] Overwrite Existing        [X] Ignore Blanks             
[ ] Convert Imprints          [ ] Specify Series            
[X] Cover Thumbs              
-------------------------------------------------------------------

=======> scraping next eComic book: 'Batman The Widening Gyre 06'
no CVDB tag found in book, beginning search...
searching for all series that match: 'batman widening gyre'
database provided 1 results for the search
displaying the series selection dialog...
... chose series 27554 ('Batman: The Widening Gyre')
searching for issue in series: 27554 ('Batman: The Widening Gyre')
querying comicvine for all available issues...
...found 0 (of 5) issues in the local cache
trying to find issue ID for issue number: 6
displaying the issue selection dialog...
...chose to skip this book.

=======> scraping next eComic book: 'Detective Comics 867'
no CVDB tag found in book, beginning search...
searching for all series that match: 'detective comics'
database provided 11 results for the search
displaying the series selection dialog...
... chose series 18058 ('Detective Comics')
searching for issue in series: 18058 ('Detective Comics')
querying comicvine for all available issues...
ERROR: cannot parse results from comicvine: 
http://api.comicvine.com/volume/18058/?api_key=4192f8503ea33364a23035827f40d415d
5dc5d18&format=xml&field_list=issues
Caught SystemError: Unexpected end of file while parsing CDATA has occurred. 
Line 2, position 36678.
Traceback (most recent call last):
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\cvconnection.py", line 169, in __get_dom
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 114, in parseString
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 92, in xml2py
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 62, in children
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 92, in xml2py
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 62, in children
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 92, in xml2py
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 62, in children
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 92, in xml2py
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 62, in children
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 92, in xml2py
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 62, in children
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 78, in xml2py
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\ipypulldom.py", line 50, in parse
RETRYING the query...
------------------- PYTHON ERROR ------------------------
Caught SystemError: Unexpected end of file while parsing CDATA has occurred. 
Line 2, position 36654.
Traceback (most recent call last):
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\ComicVineScraper.py", line 93, in _main
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\ComicVineScraper.py", line 148, in _Scraper__scrape
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\ComicVineScraper.py", line 262, in _Scraper__scrape_book
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\ComicVineScraper.py", line 406, in _Scraper__get_issue_refs
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\db.py", line 66, in query_issue_refs
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\cvdb.py", line 149, in _query_issue_refs
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\cvconnection.py", line 77, in _query_issue_ids
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\cvconnection.py", line 191, in __get_dom
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\cvconnection.py", line 180, in __get_dom

--------------------------------------------------------------------------------
Comic Vine Scraper, v1.0.25
Cache Directory:     C:\Program Files\ComicRack\Data\Scripts\Comic Vine 
Scraper\localCache/
Settings File:       C:\Program Files\ComicRack\Data\Scripts\Comic Vine 
Scraper\settings.dat
--------------------------------------------------------------------------------

--------------------------------------------------------------------
[ ] Series          [ ] Number          [X] Month           
[X] Title           [ ] Alt Series      [X] Writer          
[X] Penciller       [X] Inker           [X] Cover Art       
[X] Colorist        [X] Letterer        [X] Editor          
[X] Summary         [X] Notes           [X] Tags            
[ ] Imprint         [X] Year            [ ] Publisher       
[ ] Volume          [X] Characters      [X] Teams           
[X] Locations       [X] Webpage         
-------------------------------------------------------------------
[X] Overwrite Existing        [X] Ignore Blanks             
[ ] Convert Imprints          [ ] Specify Series            
[X] Cover Thumbs              
-------------------------------------------------------------------

=======> scraping next eComic book: 'Batman The Widening Gyre 06'
no CVDB tag found in book, beginning search...
searching for all series that match: 'batman widening gyre'
database provided 1 results for the search
displaying the series selection dialog...
... chose series 27554 ('Batman: The Widening Gyre')
searching for issue in series: 27554 ('Batman: The Widening Gyre')
querying comicvine for all available issues...
...found 0 (of 5) issues in the local cache
trying to find issue ID for issue number: 6
displaying the issue selection dialog...
...chose to skip this book.

=======> scraping next eComic book: 'Detective Comics 867'
no CVDB tag found in book, beginning search...
searching for all series that match: 'detective comics'
database provided 11 results for the search
displaying the series selection dialog...
... chose series 18058 ('Detective Comics')
searching for issue in series: 18058 ('Detective Comics')
querying comicvine for all available issues...
ERROR: cannot parse results from comicvine: 
http://api.comicvine.com/volume/18058/?api_key=4192f8503ea33364a23035827f40d415d
5dc5d18&format=xml&field_list=issues
Caught SystemError: Unexpected end of file while parsing CDATA has occurred. 
Line 2, position 36678.
Traceback (most recent call last):
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\cvconnection.py", line 169, in __get_dom
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 114, in parseString
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 92, in xml2py
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 62, in children
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 92, in xml2py
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 62, in children
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 92, in xml2py
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 62, in children
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 92, in xml2py
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 62, in children
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 92, in xml2py
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 62, in children
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\xml2py.py", line 78, in xml2py
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\ipypulldom.py", line 50, in parse
RETRYING the query...
------------------- PYTHON ERROR ------------------------
Caught SystemError: Unexpected end of file while parsing CDATA has occurred. 
Line 2, position 36654.
Traceback (most recent call last):
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\ComicVineScraper.py", line 93, in _main
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\ComicVineScraper.py", line 148, in _Scraper__scrape
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\ComicVineScraper.py", line 262, in _Scraper__scrape_book
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\ComicVineScraper.py", line 406, in _Scraper__get_issue_refs
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\db.py", line 66, in query_issue_refs
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\cvdb.py", line 149, in _query_issue_refs
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\cvconnection.py", line 77, in _query_issue_ids
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\cvconnection.py", line 191, in __get_dom
  File "C:\Program Files\ComicRack\Data\Scripts\Comic Vine Scraper\cvconnection.py", line 180, in __get_dom

Original issue reported on code.google.com by bmen...@gmail.com on 28 Jul 2010 at 1:53

GoogleCodeExporter commented 9 years ago
Thanks for the bug report. 

This is a recent problem with ComicVine--they've made some change and now when 
they are under heavy load, their server just cuts off the last part of the 
query results for large (many-issue) series.  The scraper is failing because 
the results it is getting back from ComicVine are corrupt and truncated.

I've reported this problem, but as usual, there is no response.

I'm still trying to find some other way around this issue...but until I figure 
something out (or ComicVine fixes their end), about the only thing you can do 
is try to scrape your comics when the ComicVine website isn't so busy (later in 
the evening, or first thing in the morning seems to work.)

Original comment by cban...@gmail.com on 28 Jul 2010 at 2:15