suryakencana007 / comic-vine-scraper

Automatically exported from code.google.com/p/comic-vine-scraper
0 stars 0 forks source link

Scraped Summaries are Mauled #95

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
When we parse in a summary from ComicVine, we have to convert it from HTML
to plain text.   The current algorithm that does this gets it wrong
sometimes, inserting extra spaces where they don't need to be, and deleting
something that need to be there.

Constructs that are handled especially badly include:

<br>
<p></p>
<i> </i>
etc.

See the following links for examples:

http://api.comicvine.com/issue/183782/?api_key=4192f8503ea33364a23035827f40d415d
5dc5d18&format=xml&

http://api.comicvine.com/issue/187622/?api_key=4192f8503ea33364a23035827f40d415d
5dc5d18&format=xml&

Original issue reported on code.google.com by cban...@gmail.com on 23 May 2010 at 11:04

GoogleCodeExporter commented 9 years ago
Fixed in 1.0.23

Original comment by cban...@gmail.com on 26 May 2010 at 3:50