Scraped Summaries are Mauled

When we parse in a summary from ComicVine, we have to convert it from HTML
to plain text.   The current algorithm that does this gets it wrong
sometimes, inserting extra spaces where they don't need to be, and deleting
something that need to be there.

Constructs that are handled especially badly include:

<br>
<p></p>
<i> </i>
etc.

See the following links for examples:

http://api.comicvine.com/issue/183782/?api_key=4192f8503ea33364a23035827f40d415d
5dc5d18&format=xml&

http://api.comicvine.com/issue/187622/?api_key=4192f8503ea33364a23035827f40d415d
5dc5d18&format=xml&

Original issue reported on code.google.com by cban...@gmail.com on 23 May 2010 at 11:04

suryakencana007 / comic-vine-scraper

Scraped Summaries are Mauled #95