lukehollis / iip-word-lists

Python utility for creating word lists from epidoc files
1 stars 1 forks source link

xml:id attribute for <w> element and word count #6

Open emylonas opened 3 years ago

emylonas commented 3 years ago

Each element should have an @xml:id attribute in this form:

inscription_id (file name without extension)-line_num-word_num

for ex. akko0007-1-4 would be the 4th word on the 1st line of the inscription akko0007

emylonas commented 3 years ago

Addendum and clarification:

all words in a segmented inscription should be enclosed in the following elements:

  1. <w> - these are being added now) These should have an xml:id that incorporates the word number
  2. <g> - when encountered, these should be copied over as is. (deep copy) they are most often empty elements. Sometimes they have text content, but it will just be characters, no markup. We may want to consider numbering them, but at the moment, most of them are decorations or symbols.
  3. <orig> when it's a child of <p> and not a child of <choice> This should be copied over, as is. It might be useful to give it a word number at some point.
  4. <persName> We have few if any of these. However, they will be added later on by converting some <w> elements to <persName> elements

Each of these elements should have an id and a number. (let's check what Christian does in his code)