Closed talwrii closed 8 years ago
I encountered a similar problem on BeautifulSoup 4.4.1 and reported it here, although as you can see, the issue seems to have resolved itself. Still, I see two major differences - you're using Python 2 and BeautifulSoup 3, so the issue is likely unrelated. Still, thought it was worth mentioning.
When I encountered the error, it was with TestSuncorpBankStatement in the test suite.
Fixed by #108
The parser complains about the document being empty.
This was true in 9d7b66e41ffb7857f94c4d691c67bf38ae03538fESC
I suspect that this is beautiful soup being rubbish. My experiences with beautiful soup have in general not been very positive. Is there an equivalent of lxml.etree.HTML for buggy xml? (Sorry about the F.U.D :/ - I'm too lazy to back up my statements with facts)
For reference my beautiful soup version is 3.2.1
Anyway, the following patch seemed to fix the issue but I don't fully understand what's going on, and I've reached my quota of yack shaving for today...
Here's a sanitized document that consistently exhibits the bug