Open whoopsedesy opened 10 months ago
The way we now parse TEI is using BeautifulSoup, like so:
bs4.BeautifulSoup(f, "xml")
This is the way the documentation says to parse XML; it's surprising to me that it does not throw an exception for well-formedness errors, as if it were HTML. It doesn't look like any of the other parser options ("html.parser"
, "lxml"
, "html5lib"
) will raise errors either.
Commit 99fea693629801ea07c588543ec5214bff117142 caused homerichymns.xml not to be well-formed (#88). But processing the file with e.g. tei2csv does not raise an error. XML that is not well-formed should be a fatal error in any programs that take TEI as input.