SILE (well actually our lxp parser) errors: ! undefined entity at math-showcase/mathml/joe10.xml
Workarounds
How to possibly support MathML formula using HTML/MathML entities, the list of which is quite big. Did I say "big"? (the latter even has a discussion on phi / varphi etc.)...
So...
One can replace all (HTML) entities from the MathML original file (either by their symbol or their &#xXXXX code point... but it's cumbersome and tedious in any reasonable workflow...
One can hack the inputter so as to search-and-replace entities before XML parsing... but it's crazy performance-wise and sounds rather dumb (having to substitute strings in a whole document, before parsing it? No way!)
One can add a DOCTYPE to the document, such as:
<!DOCTYPE document [
<!ENTITY times "×">
<!ENTITY lang "⟨">
<!ENTITY psi "ψ">
...
]>
... But it's also crazy and cumbersome.
One can hack the inputter to stuff that big DTD automatically at the top of the content before parsing... But that's not ideal too performance-wise (to have lua-expat parse again and again the same in-text DTD...)
A real solution?
Add <!DOCTYPE document SYSTEM "sil.dtd"> at the top of the content, if it's absent... Users might even have a customized one:
<!DOCTYPE document SYSTEM "sil.dtd" [
<!ENTITY resilient "re·sil·ient"><!-- I'm so lazy -->
...
]>
And use a modified XML parser...
local function parse (doc)
local content = {
StartElement = startcommand,
EndElement = endcommand,
CharacterData = text,
SkippedEntity = function(parser, name, isParameter)
local msg = MyAweSomeMappingOfEntities[name] or SU.error("Unknown entity: " .. name)
text(parser, msg)
end,
NotStandalone = function(parser)
return true
end,
_nonstrict = true,
stack = { {} },
}
local parser = lxp.new(content)
The key point here is to enforce NotStandalone, and provide a SkippedEntity handler that does the replacements with a table... Extensible, flexible, clever performance-wise, and still allowing explicit DTD entity declaration as override.
But of course, we don't want to do this for any random XML document. Those might have their own entities, not the HTML ones... And some of the ideas mentioned in #2111 (dedicated XML inputters with possibly other schema-based rules on space handling etc.) is perhaps even more sound than ever...
Any opinions on the topic, before I start hacking as a madman ? ;)
This relates to MathML, but raises some more interesting points regarding the "general" parsing of XML (#2111)...
Context
MathML in SIL-XML, with formula obtained from an external source...
SILE (well actually our lxp parser) errors:
! undefined entity at math-showcase/mathml/joe10.xml
Workarounds
How to possibly support MathML formula using HTML/MathML entities, the list of which is quite big. Did I say "big"? (the latter even has a discussion on phi / varphi etc.)...
So...
&#xXXXX
code point... but it's cumbersome and tedious in any reasonable workflow...... But it's also crazy and cumbersome.
A real solution?
<!DOCTYPE document SYSTEM "sil.dtd">
at the top of the content, if it's absent... Users might even have a customized one:And use a modified XML parser...
The key point here is to enforce NotStandalone, and provide a SkippedEntity handler that does the replacements with a table... Extensible, flexible, clever performance-wise, and still allowing explicit DTD entity declaration as override.
But of course, we don't want to do this for any random XML document. Those might have their own entities, not the HTML ones... And some of the ideas mentioned in #2111 (dedicated XML inputters with possibly other schema-based rules on space handling etc.) is perhaps even more sound than ever...
Any opinions on the topic, before I start hacking as a madman ? ;)
(EDIT: Fixed the SkippedEntity code example)