obdurodon / dh_course

Digital Humanities course site
GNU General Public License v3.0
20 stars 6 forks source link

Regex in XPath Functions #454

Closed pickettj closed 3 years ago

pickettj commented 4 years ago

In the last XPath class, we asked: what if Hamlet had been written on Kashyyk?

//l[contains(., " man ")
or contains(., " madam")]
/replace (., "man", "wookie")
! replace (., "madam", "my wookie")

We've mostly been using contains(); but it would appear that matches() is simply more robust, since it does essentially the same thing, but can interpret regex.

But XPath uses a slightly different flavor of Regex than what we are used to from the search box in Oxygen. For instance,

//l[matches(., "\bman\b")
or matches(., "\bmadam\b")]
/replace (., "man", "wookie")
! replace (., "madam", "my wookie")

will return an error message (i.e. the \b escape character is incompatible).

Is there an explanation of XPath-compatible regex out there? @djbpitt

djbpitt commented 4 years ago

@pickettj There’s an entire chapter about XPath regex in Michael Kay.