Transliteration - Githubissues

TitusNemeth commented 4 years ago

The Arabic script transliteration should be made consistent and unambiguous.

Currently there are different Arabic transliteration schemes used, this should be unified. Ambiguous transliteration of style names have, for example, led to misattribution of styles in the past. Thus we have 'Kufi' and 'Taʻlīq', where 'i' and 'ī' transliterate yeh, we have 'Korʼan' and 'Ruqʻa', using different transliteration for qaf and damma, this should be made consistent. It may well be that there are also differences for the transliteration of hamza and ayn.

The group ought to decide on a transliteration scheme and use it throughout. A scheme with as few diacritical signs as possible suggests itself as there are few fonts that support the most common Arabic transliteration schemes. One option could be the Library of Congress scheme, another one the Brill scheme, though the latter may require more poorly supported characters. Neither is supported by the Open Sans fonts, but good old Georgia (and other system fonts) works with the LOC scheme.

r12a commented 4 years ago

The group discussed this in https://github.com/w3c/alreq/issues/16 and concluded that we would use the LOC transcription for Arabic, and the UNGEN transcription for Persian. However, I can readily believe that we need to tidy up some of the transcriptions in the document.

The arabic and persian character apps are equipped with converters, and we used to use those to improve consistency and accuracy. For arabic, go to https://r12a.github.io/pickers/arabic/ and select Transcribe to LOC from the pull down next to Character Markup, top right. For Persian, go to https://r12a.github.io/pickers/persian/ and select Persian to UN from the same pulldown. The tool needs the text being converted (in the large box) to be vowelled for best results (esp. in arabic).

(Btw, note that we maintain a distinction between 'transliteration' and 'transcription'. The former is a 1-to-1, roundtripable correspondence between arabic and latin script letters, and the latter is often closer to the phonetics but typically doesn't allow reversible conversions. Both the LOC and UNGEN schemes are used as 'transcriptions'. )

SIL and Noto provide Latin fonts that cover all the non-ASCII characters. with ease. I suggest that we include one of those as a .woff font in the stylesheet and map it to class="trans" and class="lettername", which we should use for all transcription text.

r12a commented 4 years ago

Another aspect of this discussion which we perhaps need to agree on is whether or not to always use transcriptions for arabic and persian words. For example, should we always use tashkīl (with a diacritic and special font) in the body of the text, or is it appropriate to 'anglicise' it to tashkil (like the Unicode Standard does) for general discussions that refer to that concept?

w3c / alreq

Transliteration #204