Open diegodiv opened 1 year ago
Disclaimer: I'm just a backup maintainer for xml-conduit
and the following is just my 2 cents, @snoyberg feel free to overrule :slightly_smiling_face: .
Although this change seems perfectly functional, I do not feel comfortable merging it, and here is why.
Recovering from invalid XML documents in general sounds like a complex feature, one that xml-conduit
doesn't claim to provide as far as I know. This is not to say that we shouldn't implement it ever, but if we are to go down the rabbit hole, I believe the topic should be treated as a primary concern in the design of the library, and not as an afterthought. Tackling each edge case in the current state of the library, sounds like a slippery slope to a proliferation of if
/else
in the codebase that, I believe, will damage its maintainability.
I hope I don't come off as negative/judgmental on the proposed change here, I understand that it may be beneficial to users in the short term, but I think it is detrimental to maintainers (and thus indirectly to the users as well) in the long run.
Thanks for the concise reply.
Recovering from invalid XML documents in general sounds like a complex feature, one that
xml-conduit
doesn't claim to provide as far as I know.
Albeit less complex, xml-conduit
already provides targeted recovery features, in the form of psDecodeIllegalCharacters
.
Tackling each edge case in the current state of the library, sounds like a slippery slope to a proliferation of if/else in the codebase that, I believe, will damage its maintainability.
At the moment, xml-conduit
doesn't provide an interface that makes recovering from errors possible (in many situations): we patched the library in order to extend its behaviour. Nevertheless, the interface provides enough flexibility to recover in other settings, for example mismatched tags, without resorting to patch the library. If the library were not to implement directly more recovery features than it already does, a nice solution could be to provide such an interface.
We're sometimes confronted with managing malformed XML data that we can't fix because we didn't generate it and we don't have the possibilities to fix it upstream: these XML files come from somewhere outside of our control. We should still need to be able to parse them at least partially, if possible. So I added a recovery mode to
xml-conduit
, similar to what exists in lxml with the option recover. The target of this recovering parser is mainly unescaped characters.