obdurodon / dh_course

Digital Humanities course site
GNU General Public License v3.0
20 stars 6 forks source link

Creating unique id and href elements #469

Closed charlietaylor98 closed 3 years ago

charlietaylor98 commented 4 years ago

In one of my assignments, I wanted to link titles to list items. To give each a unique @id value, I set @id values equal to the title itself, using the translate() function remove spaces. The expression was as follows: `

            </a>`

So the title "Christy Carlson Romano, dans le rôle de Ren Stevens" became the attribute value "ChristyCarlsonRomano,danslerôledeRenStevens". Is it efficient to have @id and @href values that are this long, and is there an xpath expression that could return, say, just the first word of the title? Maybe using the tokenize() function?

djbpitt commented 4 years ago

@charlietaylor98 Good question!

The length of the attribute name shouldn’t matter in terms of processing, although as it grows longer, the advantage of having something that’s human-readable diminishes, since it becomes harder to read. More importantly, though, if there is a risk that two things may happen to have the same name, you could wind up with a validation problem because @id values have to be unique in the document.

XPath provides a method for working around this: there is a function called generate-id() that creates an @id value that is guaranteed to satisfy the following requirements:

  1. If you run the function against any two nodes in the input XML, the function outputs are guaranteed to be different. This means that you can be confident that every node for which you want to create an @id will have a value that is unique in the document.
  2. If you run it against the same node from the input XML in different places in your XSLT, it will always generate the same result. This means that, for example, you could use it for the sort of table-of-contents linking we did with the Shakespeare sonnets.

The downside of generate-id() is that the value is not human-readable, so you can’t glance at it and look for obvious mistakes the way you can with your approach.

I’ve used both of these methods in my own work. I prefer to construct my own values when I can be confident that they will be unique and easy to read. I use generate-id() in situations where I can’t come up with a good alternative, or where there’s a risk of creating duplicates.