papyri / sosol

The Son of Suda On Line
GNU General Public License v3.0
15 stars 13 forks source link

Handling the unclear tag in Demotic and Arabic texts #338

Open jcowey opened 1 year ago

jcowey commented 1 year ago

The use of the <unclear> tag in Demotic and Arabic texts is unproblematic in XML. When it is round tripped with xSugar to Leiden+ problems arise.

Instead of using underdots (which are obviously unacceptable in both language representations) we propose to the the unicode points U+2E22 and U+2E23.

Since walking from XML to Leiden+ is the challenge, the idea of some sort of preprocessing based on the parent xml:lang value of ar or egy-Egyd was floated.

In order to make entry easier there was also a proposal to think about adding a tab in the editor next to the UnderDot tag. It might be called Unclear Arabic Demotic or whatever is less clumsy and space taking.

Bildschirmfoto 2023-01-10 um 11 16 30