obdurodon / dh_course

Digital Humanities course site
GNU General Public License v3.0
20 stars 6 forks source link

XPath Tips #450

Closed dap167 closed 3 years ago

dap167 commented 4 years ago

One thing that I learned in the homework while using XPath is that you can use boolean operators, AND/OR commands, to generate a list of elements that fulfill one or both conditions. You can use this inside of a predicate or a square bracket container. They are both represented syntactically as "and" and "or". For example, if you were to search for all speeches by Hamlet or Ophelia, then you can do so using the phrase //sp[@who="Hamlet" or @who="Ophelia"], which performs functionally the same as //sp[@who = ('Hamlet', 'Ophelia')].

pickettj commented 4 years ago

Great tip, @dap167 ! There's a third way as well, which we will cover at the beginning of class tomorrow: //sp[@who=("Hamlet")] | //sp[@who="Ophelia"]. (The pipe operator in XPath is not the same thing as the or command; rather, it computes two sets of nodes.) However, your ways are more succinct.

djbpitt commented 4 years ago

Thanks, Danny! These expressions are, indeed, equivalent, and while the one using or is logical, the one using what XPath calls general equality is idiomatic, and therefore merits attention.

If you have just two possible values, the expressions are almost the same in length and complexity, but if you have, say, a dozen values, the general equality operator (that is, testing whether an item on one side of the equal sign is equal to any item in a sequence on the other side) becomes easier to read, and to type without error. And general equality also helps you when the sequence consists not of strings, but of sequences that you spell out item by item, but of those that you represent with XPath path expressions, along the lines of:

//sp[@who = //role/@xml:id]

This finds all <sp> elements with @who values that correspond to some @xml:id value on any<role> element. There’s no economical way to do this with an or expression in the predicate

djbpitt commented 4 years ago

The union operator (|) that James mentions has an alias in the form of the reserved word union. That is, the following are synonymous:

//sp[@who=("Hamlet")] | //sp[@who="Ophelia"]

and

//sp[@who=("Hamlet")] union //sp[@who="Ophelia"]
Rober-Igtm commented 4 years ago

I'm starting to understand these XPath functions, but I don't really understand Boolean logic. What exactly does it do, and what purpose does it serve in XPath? I've never taken any logic course before, so I'm probably a bit behind on this.

djbpitt commented 4 years ago

@Rober-Igtm Predicates filter items to keep the ones that satisfy a certain condition and discard those that don’t. Leaving aside numerical predicates for the moment (those like //body/div[1], which select an item according to its position in a sequence—in this case, the first act of Hamlet), predicates evaluate to True or False, and keep only the items where the value of the predicate is True. For example:

//sp[@who="Hamlet"]

collects all of the <sp> elements and tests each one to see whether it is the case that it has a @who attribute, the value of which is equal to the string Hamlet. That must be either True of False for every <sp> (although it can be False in different ways—specifically, the @who attribute could be equal to something other than Hamlet or the <sp> could be missing its @who attribute). If the predicate evaluates to True, it keeps that element; otherwise it ignores it. Boolean values are the values True and False, so these predicates can be considered to return a Boolean value.

Boolean logic describes ways of combining Boolean values according to what has been called a truth table. For example, X and Y evaluates to True only when both X and Y independently evaluate to True. In human language, “It is noon and it is raining” is true only if “it is noon” is true and “it is raining” is true. If either of those is false, the combination of the two statements cannot be true. On the other hand “I’m tired or I’m coming down with a cold” is true if either or both of the parts is true. (There is, in Boolean logic, also an exclusive or operator, which means something like “I’m tired or I’m coming down with a cold, but not both at the same time”.)

Boolean and and or operators in XPath predicates are ways of expressing compound truth tests. For example, to find all of the speeches by either Hamlet or Ophelia we can perform a Boolean or operation: //sp[@who="Hamlet" or @who="Ophelia"]. Remembering that a predicate retains the items for which the predicate evaluates to True and discards the ones for which it evaluates to False, this keeps the <sp> elements for which either the @who attribute is equal to the string Hamlet or it is equal to the string Ophelia.