I liked @CD1212's encoding-aware Data <-> String conversion, so I updated my groomXML() hack to use it. It's still a regex hack, but at least it doesn't assume or require any particular encoding.
This PR includes the unparseable document from #62 as a test case. As we discussed elsewhere, the problem in this document seems to be unquoted ampersands. The tests verify that it doesn't parse as originally harvested, but will parse if ampersands are quoted.
Not really useful as a test per se, but I thought it might be potentially helpful to have an instance of this issue in the test bank. Feel free to ignore. 😁
I liked @CD1212's encoding-aware Data <-> String conversion, so I updated my
groomXML()
hack to use it. It's still a regex hack, but at least it doesn't assume or require any particular encoding.This PR includes the unparseable document from #62 as a test case. As we discussed elsewhere, the problem in this document seems to be unquoted ampersands. The tests verify that it doesn't parse as originally harvested, but will parse if ampersands are quoted.
Not really useful as a test per se, but I thought it might be potentially helpful to have an instance of this issue in the test bank. Feel free to ignore. 😁