Closed GoogleCodeExporter closed 9 years ago
Hmm, looks like a bug with ComicVine; they're returning invalid results.
I raised a bug report on their website:
http://www.comicvine.com/forums/bug-reporting/2/api-bug-invalid-xml/532559/
I will look into this when I have a chance, and see if I can find a way to make
the
Comic Vine Scraper work around the problem--because, given the ComicVine guys'
track
record for fixing their API bugs, it may be a long time before they do
anything. :(
Thanks for the bug report.
Original comment by cban...@gmail.com
on 2 Mar 2010 at 7:19
I think it has to do with an illegal character in the summary field (the
description
at CV). It's a weird little cross symbol.
Here is the error log:
ERROR: cannot parse results from comicvine:
http://api.comicvine.com/issue/26451/?api_key=4192f8503ea33364a23035827f40d415d5
dc5d18&format=xml
Caught SystemError: '', hexadecimal value 0x10, is an invalid character. Line
2,
position 6801.
and here is the txt from CV:
"I expected to play myself!" Booster flashes an arrogant smirk. "Throw in of
merchandising and points!"
Just have the scraper parse out that illegal character and it should be fine.
Or
have the mods at CV fix the text...which may take longer.
Original comment by revqu...@gmail.com
on 2 Mar 2010 at 7:50
Yeah, I think you're right about what's going on.
If possible I'd like to try to find a description of all of the valid xml
characters,
and write a solution that strips out ALL the illegal characters, maybe
replacing them
with question marks or something. That way, we don't end up seeing this bug
again
in the future with a different illegal character.
Original comment by cban...@gmail.com
on 2 Mar 2010 at 8:58
For reference, I found this too and it was only on one of the early issues (1
or 2).
You should be able to scrape the rest and manually enter CVDB tags for the
first 2
until this issue is fixed
Original comment by bmen...@gmail.com
on 3 Mar 2010 at 12:26
Fixed in 1.0.14.
The fix that I implemented "knows" about ALL of the possibly valid XML
characters,
and will automatically strip out ANYTHING that isn't valid. (There shouldn't
be very
many comics on Comic Vine containing invalid XML characters in their
user-entered
fields, but any of them that do exist should now be parsed in properly by the
scraper.)
Original comment by cban...@gmail.com
on 5 Mar 2010 at 5:27
Original issue reported on code.google.com by
usali...@gmail.com
on 2 Mar 2010 at 5:20Attachments: