I've noticed some problems with the DCLP metadata mask and either the xml it produces or the xslt that processes it for PN. My particular area of concern is the element <term> which is the great-grandchild of <profileDesc>:
<profileDesc>
<textClass>
<keywords>
<term>
After poring over LDAB documentation regarding its keywords and classification system, I now better understand why DCLP keyword metadata is structured the way it is. I'm also now in a position to diagnose what's unsatisfactory and to make concrete recommendations.
<term> is often elaborated upon by @type (e.g., description, religion, culture), but the present handling of these is suboptimal. As I describe how in what follows, the following image will be helpful as a reference point -- this is the current SoSOL metadata mask for DCLP:
I note the following issues:
A string added via the Genre field is tagged <term type="description">
1a. PN xslt at present does not process items tagged <term type="description">, so this field is deceptive. I've been moving terms like epic or philosophy to the Genre field, only to see them vanish from PN as a result.
A string added via the Keyword field is tagged <term> (with no attribute)
2a. PN xslt at present processes <term/> (with no attribute) so that it appears in PN under Genre. This also strikes me as somewhat deceptive, or at least not as well formed in xml as it could be.
When set next to literary genres such as history, epic, comedy, philosophy (etc.), poetry and prose seem to be making a broader distinction. But the xml at present for all of these is <term> (with no attribute), except for cases where dutiful editors like myself have moved generic descriptions such as epic or philosophy to the Genre field, only to see them vanish from PN as a result.
3a. I therefore suggest implementing @type="format", whose values could be poetry or prose, with undetermined as a third option. They should be an authority list, accessed via dropdown menu in SoSOL. (NB: poetry and prose only appear as <term> for files whose culture is literature (i.e., <term type="culture">literature</term>)
3b. I would also suggest implementing @type="genre" across DCLP to tag the wide range of literary genres and other descriptive terminology. Doing so would allow us to dispense with <term type="description">, which is what the Genre field of the metadata mask currently outputs (but which current xslt does not process). Doing so would allow us to use <term> (with no attribute) for non-generic descriptive terminology, e.g. calendar, tachygraphy, exercise, drawing, title, etc.
3c. IMO, it is ok if the xslt prints @type="genre" and @type="format" together in PN under Genre, so long as the xml disambiguates. This is basically what the current system achieves (where both are tagged <term>, without attribute)
For illustrations of 1, 1a, 2, and 2a, see the following two images of 171900:
For illustrations of 3, see the following two images of 60408:
Further improvements are possible, too
The options for <term type="culture"> should also be governed by an authority list, accessed via dropdown menu in SoSOL: the four options are literature, science, religion, and art.
5a. Sometimes two options are listed (i.e., <term type="culture">science or religion</term>), which the authority list will require splitting up. We will therefore also want to allow for more than one item to be tagged <term type="culture">, in which case xslt will have to add Or between them for display in PN.
All keyword fields should have the same tickbox for 'unclear' (adding @cert="low" to the xml) as the metadata mask section for Provenance currently does. This change would also require xslt to print (?) after the keyword in question
There are potentially further steps we could take to improving the handling of DCLP keyword metadata, but since what I've suggested already will require changes to DCLP metadata xslt and SoSOL, it seemed wise to at least start the conversation. I'm happy to discuss when you get the chance: I'm expecting another dump of metadata from TM in the near future that will allow me to Xwalk xml for literary papyri published since the dawn of DCLP, and it would be good to have a sense of how I want to wrangle it all. So long as I understand how SoSOL will work moving forward, I can wrangle the existing data on my own.
I've noticed some problems with the DCLP metadata mask and either the xml it produces or the xslt that processes it for PN. My particular area of concern is the element
<term>
which is the great-grandchild of<profileDesc>
:After poring over LDAB documentation regarding its keywords and classification system, I now better understand why DCLP keyword metadata is structured the way it is. I'm also now in a position to diagnose what's unsatisfactory and to make concrete recommendations.
<term>
is often elaborated upon by@type
(e.g., description, religion, culture), but the present handling of these is suboptimal. As I describe how in what follows, the following image will be helpful as a reference point -- this is the current SoSOL metadata mask for DCLP:I note the following issues:
Genre
field is tagged<term type="description">
1a. PN xslt at present does not process items tagged<term type="description">
, so this field is deceptive. I've been moving terms like epic or philosophy to theGenre
field, only to see them vanish from PN as a result.Keyword
field is tagged<term>
(with no attribute) 2a. PN xslt at present processes<term/>
(with no attribute) so that it appears in PN underGenre
. This also strikes me as somewhat deceptive, or at least not as well formed in xml as it could be.<term>
(with no attribute), except for cases where dutiful editors like myself have moved generic descriptions such as epic or philosophy to theGenre
field, only to see them vanish from PN as a result. 3a. I therefore suggest implementing@type="format"
, whose values could be poetry or prose, with undetermined as a third option. They should be an authority list, accessed via dropdown menu in SoSOL. (NB: poetry and prose only appear as<term>
for files whose culture is literature (i.e.,<term type="culture">literature</term>
) 3b. I would also suggest implementing@type="genre"
across DCLP to tag the wide range of literary genres and other descriptive terminology. Doing so would allow us to dispense with<term type="description">
, which is what the Genre field of the metadata mask currently outputs (but which current xslt does not process). Doing so would allow us to use<term>
(with no attribute) for non-generic descriptive terminology, e.g. calendar, tachygraphy, exercise, drawing, title, etc. 3c. IMO, it is ok if the xslt prints@type="genre"
and@type="format"
together in PN underGenre
, so long as the xml disambiguates. This is basically what the current system achieves (where both are tagged<term>
, without attribute)For illustrations of 1, 1a, 2, and 2a, see the following two images of 171900:
For illustrations of 3, see the following two images of 60408:
Further improvements are possible, too
<term type="culture">
should also be governed by an authority list, accessed via dropdown menu in SoSOL: the four options are literature, science, religion, and art. 5a. Sometimes two options are listed (i.e.,<term type="culture">science or religion</term>
), which the authority list will require splitting up. We will therefore also want to allow for more than one item to be tagged<term type="culture">
, in which case xslt will have to add Or between them for display in PN.@cert="low"
to the xml) as the metadata mask section for Provenance currently does. This change would also require xslt to print (?) after the keyword in questionThere are potentially further steps we could take to improving the handling of DCLP keyword metadata, but since what I've suggested already will require changes to DCLP metadata xslt and SoSOL, it seemed wise to at least start the conversation. I'm happy to discuss when you get the chance: I'm expecting another dump of metadata from TM in the near future that will allow me to Xwalk xml for literary papyri published since the dawn of DCLP, and it would be good to have a sense of how I want to wrangle it all. So long as I understand how SoSOL will work moving forward, I can wrangle the existing data on my own.