Normalize encoding of titles

srophe / britishLibrary-data

GNU General Public License v3.0

0 stars 3 forks source link

Normalize encoding of titles #118

Open wlpotter opened 2 years ago

wlpotter commented 2 years ago

There are several different formats of titles and a lot of variance regarding which elements and attributes are allowed and required on title elements. To sort that out we need a sheet with title elements and their structures.

It would also be worth collecting together the issues related to titles and their data format(s). This will help us formulate and implement rules regarding unspecified contents, unnamed sub-parts of named works, etc.

wlpotter commented 2 years ago

First step is to get the list of titles and their structures. (Note that normalization is probably a prerequisite for #89 ).

wlpotter commented 2 years ago

Sheet with titles, their structures, and some data about attribute and child element usage: https://docs.google.com/spreadsheets/d/1e8AWvx-2drh9o7dmYG6eyfw18S95_40--kjiFN71CSA/edit?usp=sharing

wlpotter commented 2 years ago

strip out @type="supplied" as redundant (it is all assumed to be from Wright)
add in @resp for ones we've named
- check where we have a @resp already
- Unspecified contents (add a @resp)
- make an issue for ones like "Part 1" or "Part of x work" (not fir initial release)

wlpotter commented 2 years ago

Leave pers and place Name should remain; if the URI on a persName matches that on an author element (wait on #91 ), mark as 'attributed author'.

wlpotter commented 2 years ago

For titles in titles (tagged either as tei:title or with a tei:ref, and maybe other ways), pull these out once the msItem ids are stable (wait on #58 ). We will have a list with ms and ms part uri ; msItem title ; msItem xml:id; xpath to the item; item title's text node; text node of the child title ; URI of overall work ; URI of child work ; author info and other useful identifying info like rubric, incipit, etc. Store in a csv and reference when creating and updating work authority records from the ms data

Also want to normalize the element used to tag these child titles.

wlpotter commented 1 year ago

Note to self: leaving this in backlog as it is low priority right now. But it should be split into several subtasks once we're ready to work on it. For example, #54 is one such sub-task