obdurodon / dh_course

Digital Humanities course site
GNU General Public License v3.0
20 stars 6 forks source link

Using <xsl:for-each> to keep count #464

Closed EsRessel closed 3 years ago

EsRessel commented 4 years ago

For XSLT 6 I am using the text from the Sherlock Holmes project. I am attempting to create a list of the number of times each character speaks. My desired output is the name of the character, a colon, and then the number of times they speak. In the XML, the dialogue is marked using <q> tags with @speaker attributes. Here is the code that has gotten me closest to my desired output:

<ul>
  <xsl:for-each select="//@speaker">
    <li>
        <xsl:value-of select="."/>
        <xsl:text>: </xsl:text>
        <xsl:value-of select="count(//q[contains(@speaker, current())])"/>
    </li>
  </xsl:for-each>
</ul>

A snippet out my output is:

It seems that the XSLT is looking at each <q>, finding the value for @speaker and then outputs the correct times that character speaks in the entire text. However, it does this for every <q> not just one time. I've been stuck here for a while. Does anyone have any pointers for how to fix this?

pickettj commented 4 years ago

@EsRessel , would it work to count the paragraphs within which each person speaks, since (I think) only one person speaks per paragraph?: //p[q/@speaker="Sherlock"]

EsRessel commented 4 years ago

@pickettj I tried count(//p[q/current()]) and had the same results but the number for every character was 645, which is the total of all <q> in the text. I think my issue is in the loop since my original numbers were correct.

djbpitt commented 4 years ago

@EsRessel When you write <xsl:for-each select="//@speaker">, you ask it to do whatever is inside the <xsl:for-each> element once for every @speaker attribute in the entire document. This means that if, for example, Sherlock speaks 339 times, you’ll get 339 identical outputs for him. You can fix this by looping over not all @speaker attributes, but all distinct @speaker values.

An alternative approach uses <xsl:for-each-group> to group the speeches by @speaker value and then report the grouping key (distinct speaker) and the size of the group (number of speeches by that speaker).

Both approaches will run into trouble if there is any speaking in unison. If that happens, the simplest course of action is to treat that as a different speaker, and not try to unify it with solo speech. (That, though, runs into trouble if, for example, there is a speech by Sherlock and Watson [in that order] and one by Watson and Sherlock [in that order], since those aren’t string-equal.) And if you want to attribute the same joint speech to each speaker, and thus double-count it, there are a few approaches, none of which is especially pretty; let us know if that’s necessary and we can explore further.