obdurodon / dh_course

Digital Humanities course site
GNU General Public License v3.0
20 stars 6 forks source link

XSLT #2 - Header Rows #456

Closed EsRessel closed 3 years ago

EsRessel commented 4 years ago

I am having a bit of trouble creating the heder rows for each of the tables. I am most confused by the placement of the <xsl:apply-templates/> element. My current solution has three <th> elements in which I manually typed the headers for each of the columns, with the <xsl:apply-templates/> tag after all three headers. The tag has a @select attribute with the XPath expression that finds <characters> that are children of <cast> or <faction> that are children of <cast> depending on which table is being processed.

Though the solution works, I'm not sure if it's correct. The solution wouldn't apply to another document that has the same structure but different table headers since these were manually typed by the programmer. Is there a way to extract the information needed for the table headers using XSLT? If attributes are contained in elements, can the names of these attributes be extracted using XPath and from there used in an <xsl:apply-templates/> element? How many <xsl:apply-templates/> elements would be needed and where would they be placed?

djbpitt commented 4 years ago

@EsRessel This is a terrific question. The answer we were looking for is the one you describe in your first paragraph, but your explanation, in the second paragraph, of how it is suboptimal is exactly right. One way to approach an alternative solution would be to create the header row by retrieving the distinct names (not values) of all of the attributes, and then populate the data rows by retrieving the values of those attributes for each character or faction.

This runs into an inconvenient detail: attributes in XML are unordered. They look ordered to us when we look at the angle-bracketed view (because we read from left to right, so something has to be leftmost, etc.), but that’s a serialization of the XML, and not the real XML. The real XML is a tree, and by definition when you serialize XML there is no guarantee that the attributes will come out in a specific order (in particular, there is no guarantee that applying templates to all of the attributes will output information according to the order in which they appear in the angle-bracketed view). In fact, they could be returned in different orders for different elements of the same type, whether or not they are ordered the same way in the serialization. This means that if you want them in a specific order, you have to put them in that order, and it’s difficult to do that without calling them by name. We first considered alphabetizing them (using the XSLT <xsl:sort> element), but we don’t want them alphabetized completely; we want the @id first, since that’s what humans regard as the name of the character or faction, and if we alphabetize alignment, id, and loyalty, alignment will be first.

We can fix that by addressing the @id attribute by name and then alphabetizing the others. Here’s one way to do that (it uses some features of XSLT that we haven’t introduced yet; see the discussion after the example):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="#all"
    version="3.0" xmlns="http://www.w3.org/1999/xhtml">
    <xsl:output method="xml" indent="yes" doctype-system="about:legacy-compat"/>
    <xsl:template match="/">
        <html>
            <head>
                <title>Skyrim</title>
            </head>
            <body>
                <h1>Skyrim</h1>
                <section>
                    <h2>Characters</h2>
                    <table>
                        <tr>
                            <th>id</th>
                            <xsl:for-each
                                select="distinct-values(//cast/character/(@* except @id)/name())">
                                <xsl:sort/>
                                <th>
                                    <xsl:value-of select="."/>
                                </th>
                            </xsl:for-each>
                        </tr>
                        <xsl:apply-templates select="//cast/character"/>
                    </table>
                </section>
                <section>
                    <h2>Factions</h2>
                    <table>
                        <tr>
                            <th>id</th>
                            <xsl:for-each
                                select="distinct-values(//cast/faction/(@* except @id)/name())">
                                <xsl:sort/>
                                <th>
                                    <xsl:value-of select="."/>
                                </th>
                            </xsl:for-each>
                        </tr>
                        <xsl:apply-templates select="//cast/faction"/>
                    </table>
                </section>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="character | faction">
        <tr>
            <xsl:apply-templates select="@id"/>
            <xsl:apply-templates select="@* except @id">
                <xsl:sort select="name()"/>
            </xsl:apply-templates>
        </tr>
    </xsl:template>
    <xsl:template match="@*">
        <td>
            <xsl:value-of select="."/>
        </td>
    </xsl:template>
</xsl:stylesheet>

If we focus on just the characters for the moment, we create a table with a header row, specify that the first cell in the header row should be for the @id, and then get the names (not values) of all of the character attributes other than @id, use distinct-values() to remove the duplicates, sort the unique names with <xsl:sort>, and write each one into a header cell. We use <xsl:for-each> to process the names because it’s common practice to apply templates to nodes but use <xsl:for-each> for atomic values, and distinct-values() returns a sequence of strings, that is, atomic values, rather than nodes. (It is possible to apply templates to strings, but it’s less common.) Similarly, we use <xsl:value-of> to return an atomic value, in this case one of those distinct strings (but be wary of using <xsl:value-of> for nodes; for reasons we’ll explore later, it’s usually better to apply templates to nodes).

After we write our header row, we apply templates to all of the characters in the cast list. We can use the same template to match both characters and factions because although they have different attributes, with this approach we don’t have to specify the attributes, and the template will just process whatever it finds. As with the header row, we process the @id first and then sort the others alphabetically. By default <xsl:sort> alphabetizes in ascending order by value, so when we sorted the header labels, we could rely on that default. When we sort the attribute values for output, though, we want them in alphabetical order not by value, but by the attribute name, so that they’ll be in the same order as the headers. (Otherwise, although “alignment” precedes “loyalty” alphabetically, if we sorted by value, a loyalty of “daedra” would wind up preceding an alignment of “evil”, which means that it would wind up in the wrong column.)

In Real Life we would tighten this up in a couple of ways:

  1. We would write a user-defined function to create the tables and call it twice, once for characters and once for factions. In this way we wouldn’t have to repeat the code they have in common, such as the <h2> header or the table superstructure.
  2. We might want to capitalize the first letter of the column headers, and we could create a user-defined function to do that.
  3. The code that retrieves and deduplicates the attribute names is hard to read because it’s a long path expression and it’s nested inside distinct-values(). We would use the simple map operator (!) and the arrow operator (=>) to rewrite this expression to make it more legible. You can read about the simple map operator at https://www.w3.org/TR/xpath-31/#id-map-operator and the arrow operator at https://www.w3.org/TR/xpath-31/#id-arrow-operator. These are both new features of XPath, so they aren’t in Michael Kay.