mbakeranalecta / sam

Semantic Authoring Markdown
Other
79 stars 8 forks source link

What if the code of an embed is XML? #145

Closed mbakeranalecta closed 7 years ago

mbakeranalecta commented 7 years ago

What if the code embedded in an embed is XML?

Should it be dumped out as the content of elements just like anything else or should it be escaped in some way. Need to think through the use cases to be sure.

mbakeranalecta commented 7 years ago

Part of the question here is, what is the difference between embedded XML (XML fragments) and an embed that happens to be coded in XML. The embedded XML facility was dreamed up long before the general embed facility was conceived of. It intention was to let the writer write something in XML if it was hard to write it in SAM. It was supposed to simply fall through into the output as if it had been interpreted from SAM markup. In other words, it was was supposed to be part of the structure of the SAM document, not some other document embedded within it.

The problem with this concept is that it does not fit well with the idea of a SAM schema. An embed (a LaTeX equation, for instance) is effectively an object embedded within the SAM document. The SAM parser is not concerned at all with its syntax or its internal structure. But embedded XML (like embedded HTML in Markdown) was intended to be part of the structure of the SAM document, which means it is an alternate syntax for the same structure. That means a SAM schema validator would have to embed an XML parser to check the structure of the embedded XML. That seems vastly over-complicated.

This seems to argue that the embedded XML facility should be removed. If there are things you can't express easily in SAM, embedded something else to express them makes sense, and treating them as separate objects to be validated externally keeps things clean. Therefore there seems to be a strong argument for removing the old embedded XML syntax and recommending using embed with XML syntax where such a facility is needed.

By this argument, the XML in the embed is just another encoding. It should be dumped to the output. Dumping XML into an XML element is not a problem in itself, as long as the processing application recognizes it and knows what to do with it.

The other issue would be we using XML schema languages to validate the output. Here again, though, the schema could have an open content model for embedded XML.

mbakeranalecta commented 7 years ago

One issue with dumping the XML of an embed into the serialized output is what do do if the writer inserts and XML declaration into the embeded XML. That would not be valid in the serialized XML output.

mbakeranalecta commented 7 years ago

More generally, what happens if there are any errors in the embedded XML?

With XML fragments that was dealt with by parsing the fragment before including it, and raising an error if it did not parse. We could do the same things with XML in an embed, but not all embeds are XML and not all XML embeds will be have their type described as =xml. So detection is problematic. Life is much simpler if we say that an embed is simply a string that the application layer can do what it wants with.

And it is worth noting that the embed facility is mostly a convenience for writers who need to express something like an equation inline in their document. It is not a general facility for embedding objects into the text. Most objects should be included by reference, not embedding.

mbakeranalecta commented 7 years ago

Where does this leave us on XML fragments. They are still incompatible with the development of a SAM schema, which is a much more important feature. SAM is more capable than when the XML fragment facility was invented, so there is less need for it. There are still going to be some kinds of markup that are hard to do in SAM, but that is true of all markup systems. SAM is not designed to be perfectly general.

mbakeranalecta commented 7 years ago

The conclusion that this dialogue with myself seems to be leading is that XML fragments should be dropped and that that all embedded encodings should be serialized to XML as strings. Other alternatives just seem too complicated, not merely to implement but to explain.

The only alternative would seem to be to provide yet another embed mechanism specifically for XML that would allow it be be serialized as XML. Is there a compelling use case for such a facility?

The most obvious one would be to allow the embedding of complex table markup that involves structures that are hard to create in SAM. But the fact is that complex table markup (with complex spans) is impossible to make lucid in any form of markup, including XML. Complex tables are media domain artefacts than can only be lucidly created with a WYSIWYG editor. SAM is designed to avoid the need for such an editor.

In short, SAM is the wrong tool for complex tables. So giving it the facility to create them by a means that makes so many other things awkward makes little sense.

I really can't think of another compelling use case.

mbakeranalecta commented 7 years ago

Adding to the above, there is actually no reason you cannot express a complex table layout using blocks and fields in SAM. It will be incomprehensible to the reader (and writer) but that is true of complex tables in XML as well. Only a WYSIWYG editor will make them comprehensible. So allowing XML fragments in SAM does nothing for comprehensibility. At best, it would allow the writer to paste complex table markup from an XML editor into a SAM document. But in that case it would make far more sense to treat the table as an external resource and bring it into the SAM document by reference, since that would allow you to continue to maintain it in the XML editor.

At the block level, SAM can express any semantics XML can, simply by replacing arbitrary attributes with arbitrary fields. It is only at the intertextual level that SAM is limited, and even then, not by much. You can use annotation chaining and arbitrary annotations to reproduce most of what you can do with arbitrary embedded elements in XML. The only real syntactic restriction is that you cannot next structured below the paragraph level because you cannot nest phrases. And that restriction is a deliberate part of the design. The proposed schema support for patterns addresses most of this need, so there is very little justification for XML fragments here either.

mbakeranalecta commented 7 years ago

Adressed in 4ef8a28e9a004d3faacd604a9635f7c382ff2ca6