qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

Coupling of global variable-bound maps to character maps in XSLT #1500

Open Arithmeticus opened 6 days ago

Arithmeticus commented 6 days ago

In an application I am writing now, the xsl:output-characters I am writing in my xsl:character-map are of interest elsewhere in the XSLT complex that is slowly emerging.

The exercise makes me realize that character maps can be interesting in their own right. We give xsl:character-maps names, and include them within each other, because they group meaningfully related character-string pairs. Such sets are the sort of thing one might want to have more closely coupled to the XSLT apparatus. For example, someone might create a xsl:character-map to deal with Unicode characters in a particular script. And those characters are of interest in their own right, and the character selection might engage with other processes that need to interact with those characters.

What if we were to extend @use-character-maps to allow character maps to draw from other preexisting maps? The list of eqNames in @use-character-maps would be resolved first against names of character maps. For any eqName that is not the name of a character map, the processor would search for a global variable or global parameter by that name. Any referenced global variable/parameter must be empty or a map. Every key must be castable as a character, and every value must be a string. Failure on any of these points would raise an error.

Here is an example of hypothetical XSLT code, to illustrate how the innovation might be productively useful, producing two different character maps, each of which might be appropriate for one type of serialization or another:

    <xsl:item-type name="letters:grc" as="record(transliteration as xs:string, name as xs:string)"/>
    <xsl:variable name="master-map" as="map(*)">
        <xsl:map>
            <xsl:map-entry key="'α'" select="letters:grc('a', 'alpha')"/>
            <xsl:map-entry key="'β'" select="letters:grc('b', 'beta')"/>
        </xsl:map>
    </xsl:variable>
    <xsl:variable name="serialization-transliteration-map" as="map(xs:string, xs:string)">
        <xsl:map>
            <xsl:for-each select="map:keys($master-map)">
                <xsl:map-entry key="." select="$master-map(current())?transliteration"/>
            </xsl:for-each>
        </xsl:map>
    </xsl:variable>
    <xsl:variable name="serialization-name-map" as="map(xs:string, xs:string)">
        <xsl:map>
            <xsl:for-each select="map:keys($master-map)">
                <xsl:map-entry key="." select="$master-map(current())?name"/>
            </xsl:for-each>
        </xsl:map>
    </xsl:variable>
    <xsl:character-map name="transliteration" use-character-maps="serialization-transliteration-map"/>
    <xsl:character-map name="names" use-character-maps="serialization-name-map"/>

In other words, if an xsl:character-map is just a map, why not give it access to other XSLT structures that are maps?

michaelhkay commented 5 days ago

Note that fn:serialize allows the character map to be supplied simply as a map (and doesn't allow reference to statically declared maps).

I think one could do a couple of things:

(a) Provide an XSLT function fn:character-map(QName) as map(string, string) which extracts a named character map as a dynamic map suitable for use by fn:serialize or elsewhere

(b) Allow xsl:result-document to use a dynamically constructed map rather than a static named map (at present you can supply the name dynamically, but the name has to identify a statically-defined character map).

I'm less keen on turning xsl:character-map into something that's dynamically constructed rather than statically constructed - there are too many complications, e.g with import precedence (as we know from xsl:attribute-set). I'd prefer to use variables for that.

ChristianGruen commented 5 days ago

Related (sorry, it’s XQuery), for the sake of completeness:

In our implementation, we have made output:use-character-maps a legal output declaration. The mapping uses = and ,:

(: Result: <xml>huebsch</xml> :)
declare namespace output = 'http://www.w3.org/2010/xslt-xquery-serialization';
declare option output:use-character-maps 'ä=ae,ö=oe,ü=ue,ß=ss';
<xml>hübsch</xml>

Commas in values can be encoded by doubling them, as known from string literals for quotes.

Arithmeticus commented 3 days ago

After mulling, I've warmed up to Michael's suggestion a. It makes sense that one build bridges from static to dynamic, not the other way around as I was proposing. And it provides the bridge necessary to incorporate character maps in the larger code base, to make it more DRY.

I'm not opposed to suggestion b, but I also haven't encountered the need for it yet.

fn:serialize cannot refer to a statically declared map? Not even if it's declared in a static parameter?