lukehollis / iip-word-lists

Python utility for creating word lists from epidoc files
1 stars 1 forks source link

<unclear> #8

Open emylonas opened 3 years ago

emylonas commented 3 years ago

Coming soon.

emylonas commented 3 years ago

<unclear> is usually part of a word. It may appear at either end of a word or it may appear within a word.

zoor0003 simple case: ἐτῶ<unclear>ν</unclear> should be: <w>ἐτῶ<unclear>ν</unclear></w> zoor0228 <unclear>ἀ</unclear>ποθα<lb break="no"/>νόντος should be: `ποθανόντος

zeichman commented 3 years ago

<unclear> can be adjacent to <supplied> without spaces, in which case they form a single word.

For example (jeru0546): <unclear>τ</unclear><supplied reason="lost">ῶν</supplied> should be rendered as

<w><unclear>τ</unclear><supplied reason="lost">ῶν</supplied></w> or τῶν

Jeru0554 and idum0190 are other examples of this.

Edit: I think this is related to the larger issue of <supplied> tags noted in #12