Added Ottoman Turkish - Githubissues

rosettatype / hyperglot

Hyperglot: a database and tools for detecting language support in fonts

http://hyperglot.rosettatype.com

GNU General Public License v3.0

162 stars 22 forks source link

Added Ottoman Turkish #31

Closed skurzinz closed 3 years ago

skurzinz commented 3 years ago

I did not do any tests as I don't currently have a working python environment at hand. Please check before merging.

alerque commented 3 years ago

I was just looking at the regular Turkish the other day, it appears to have a couple issues, notably some historical glyphs listed that aren't currently part of the orthography.

I'm not much of an expert on Ottoman, but the gist of this looks right.

skurzinz commented 3 years ago

I guess it may be an option to just import/inherit the standard arabic (arb) instead of literally including the glyphs. I did not find an example of this in the database file with multiple orthographies.

NB historically the armenian alphabet was also sometimes used to write ota, but as I am completely lost at this I won't even try including this.

There are other transcription alphabets for ota available as well, but the glyph coverage should not differ much. I just went with IJMES as I am using it for an edition project through https://github.com/QHOD/ota-keyboard.

kontur commented 3 years ago

Thanks for the contribution, it is added for the next release. I've tweaked the data so the Arabic is inherited from arb for cleaner representation. Feel free to submit an addition to CONTRIBUTORS.txt if you want to get listed (persons, not organizations) 👍

kontur commented 3 years ago

@skurzinz Sorry for backtracking after the merge; we're just reviewing marks in orthographies and this one popped up. I see you've listed combining macron below (U+0331) and combining minus below (U+0320), but do not used them in any of the characters of base.

I see transliteration on e.g. Wikipedia has e.g. s with macron below.

Would it make sense to add those macron below characters? These being dropped might also have been a result of applying hyperglot-save and "pruning" the data (which we are changing). And is the inclusion of U+0320 a mistake or what is it used for?

skurzinz commented 3 years ago

@kontur thanks for notifying me of this error. U+0320 is a mistake on my side. Likely I copy-pasted from somewhere without noticing, writing directly in the Github editor and not in some hex aware environment :) Another error is not including the SsZs with macron below in the base character list. Both was already present in my original PR.

The official transliteration table of IJMES is available as a PDF only. It would also be applicable to Arabic, (Modern) Turkish (if written in arb and Persian/Farsi transcription.

S/s with macron below is not available as a combined character in Unicode. I'll see if I can fix my errors and submit a new PR.

skurzinz commented 3 years ago

For the record: For finding the culprit I was successful in using VSCode and the https://github.com/medo64/code-point/ extension. Atom and https://atom.io/packages/character-table did not work for the purpose.