Open jjelosua opened 8 years ago
It looks that it is a BeautifulSoup4 problem: Diagnostic running on Beautiful Soup 4.4.1 Python version 2.7.11 (default, Jan 22 2016, 08:28:37) [GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] I noticed that html5lib is not installed. Installing it may help. lxml is not installed or couldn't be imported.
Trying to parse your markup with html.parser Here's what html.parser did with the markup:
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="content-type"/>
</head>
<body style="background-color:#ffffff;padding:72pt 72pt 72pt 72pt;max-width:468pt">
<hr>
<p style='font-size:11pt;padding:0;font-family:"Arial";margin:0;color:#000000'>
<span style="font-style:italic">
NPR: eh-test-link the seventh with
</span>
<span style="font-style:italic;color:#1155cc;text-decoration:underline">
<a href="https://www.google.com/url?q=http://www.npr.org/&sa=D&ust=1474862874391000&usg=AFQjCNFqt0rLmuWX1Yt0VH_bsnt0UJmITg" style="color:inherit;text-decoration:inherit">
link
</a>
</span>
<span style="font-style:italic">
and other things
</span>
</p>
</hr>
</body>
</html>
Input:
Output: