lukehollis / iip-word-lists

Python utility for creating word lists from epidoc files
1 stars 1 forks source link

<foreign> #13

Open lukehollis opened 3 years ago

lukehollis commented 3 years ago

May have one or more words in it. Change language tag for words inside it.

emylonas commented 3 years ago

Examples: jaff0052 mainLang is Hebrew, but this is a Greek word <foreign xml:lang="grc">θα</foreign> it should appear as: <w xml:id="jaff0052-n" xml:lang="grc">θα</w>

Here is one with two words in the same inscription: <foreign xml:lang="grc">Ῥαβὶ Ἰοδα</foreign> it should appear as: <w xml:id="jaff0052-n" xml:lang="grc">Ῥαβὶ</w> <w xml:id="jaff0052-n+1" xml:lang="grc">Ἰοδα</w>

geze0008 foreign with another tag inside: <foreign xml:lang="grc">ΑΛ<unclear>ΚΙΟ</unclear>Υ</foreign> should appear as: `ΑΛΚΙΟΥ because there are no spaces.

zeichman commented 3 years ago

I think I noticed a significant issue with the <foreign> tag. I don't see any words tagged <foreign xml:lang="heb"> in the wordlist - the same seems to be the case with words tagged '<foreign xml:lang="lat">. I think this might be because there is a discrepancy between the language tag used (he, la) at the beginning of the file and the language tag used in for foreign id (heb, lat).

Edit: it doesn't look like any words tagged foreign are in the wordlists at all. E.g., jeru0556 includes <foreign xml:lang="grc">Ἐλισαβη</foreign>, but this word is not in the output at all.