Closed sarneaud closed 6 years ago
Thinking a little more, a better return type would be Nullable!R2
where R2
is a range with the same element type as R
.
I didn't add support for it, because it wasn't clear to me from reading the XML spec that it was even possible to guarantee that skipping an entity when parsing it would result in a valid XML document (e.g. if it inserted a start tag but not an end tag). After some discussions about it in D.Announce, I think that it's guaranteed that any such entity has to be complete enough that skipping it won't screw up the rest of the document. And as such, what I'm probably going to do is add an option to Config
where you can tell it to treat entities as normal text. That way, by default, it would still throw, but anyone who wanted to let unparsed entities be ignored would be able to do so. However, I will probably have it still throw in the case where the entity is clearly invalid (not as in undeclared but as in contains characters that clearly make it so that it could never be a valid entity).
It's my intention to tackle this after I've finished the writer support, since that's almost done.
Either way, I don't see much point in adding support for trying to actually process entity references. If I made it possible to skip the entities, then in principle, a parser could parse the DTD, then use dxml to parse the rest of the document, and then process the entities in the document itself, but if you're going that far, you probably might as well just write the full parser rather than using dxml. Given that dxml doesn't parse the DTD, I think that the only options that make sense are to either throw when it encounters an entity reference (like it does now) or to just skip them and let the program using dxml either ignore them or try do something on its own to handle them if it really wants to. And I'm fine with making the second possible so long as it's not going to result in treating invalid XML documents as valid due to the fact that the entities weren't replaced with whatever they were supposed to be replaced with.
Hi, I have an XML document that happens to contain references to entities in its DTD. For my use case, I don't care about interpreting them, but the references are still there. I get the tradeoff dxml makes in not supporting the DTD, but currently I can't use dxml to process this document at all because an
XMLParsingException
gets thrown.It would be useful to have a way to work around this case.
How about supporting a hook like
immutable(ElementType!R)[] translateEntityRef(ref R reference)
(i.e., returns astring
for achar
range,wstring
forwchar
range, etc.)? The hook either returns the value of the reference, or null if the reference isn't supported. The idea is that the same function implementation could be used in the config forparseXML
and as a hook fornormalize
(of course, separate implementations could be used if needed for performance reasons) and existing functions likeparseStdEntityRef
could be adapted to fit the same interface.I don't mind submitting a PR, but I'd like to get your feedback first.