lukehollis / iip-word-lists

Python utility for creating word lists from epidoc files
1 stars 1 forks source link

<num> and <orig> #3

Open emylonas opened 3 years ago

emylonas commented 3 years ago

<num> elements should be copied as is. (deep copy) <orig> when it is the child of <p> should be copied as is. (deep copy)

Note that <orig> can also appear as a child of <choice>. That is handled differently.

emylonas commented 3 years ago

<num> and <orig> should NOT have a <w> element around them. When you see a <num> or <orig>, it should just be copied. Ex: gada0001 original

<p><lb/>Vexilla<lb break="no"/>tio <expan><abbr>Leg</abbr><ex>ionis</ex></expan>
                    <lb/><num value="6">VI</num> <expan><abbr>Ferr</abbr><ex>atae</ex></expan></p>

should be

<p><lb/><w>Vexilla<lb break="no"/>tio</w> <w><expan><abbr>Leg</abbr><ex>ionis</ex></expan></w>
                    <lb/><num value="6">VI</num> <w><expan><abbr>Ferr</abbr><ex>atae</ex></expan></w></p>

where the word Vexilla is surrounded by a <w>, each <expan> elements is surrounded by a <w> element, but the <num> element is copied as is.

Ideally, the num element will also have xml:id="gada0001-xx" - in order to be able to re-create the KWIC view.