virtualvinodh / aksharamukha

Aksharamukha
161 stars 41 forks source link

Tibetan - adjust to standard romanization (Wylie) #31

Closed yakcz closed 3 years ago

yakcz commented 5 years ago

Although they are derived from South Asian, one cannot treat the Tibetan letters as the South Asian. Pls refer to rules of Wylie transliteration of Tibetan. If Aksharamukha's including Tibetan should be of any use, Tibetan has to be transcribable from Wylie romanization. To test, I tried two random words, both were rendered incorrectly from Roman into Tibetan: "phyi rgyal" gave ཕྱི་རྒྱལ྄ (the viram under the final letter should not be there!) "bsgrubs" gave བྶྒྲུབྶ྄ (here the error is the S subscribed under the B, while it should follow the B)

At this stage the Aksharamukha will only be able to render correctly Sanskrit dharanis written in Tibetan, but no Tibetan text.

Both the above errors are easy to remove: 1) There is no use of viram (halant) in Tibetan, so when a syllable ends with a consonant in Romanization, it will end with the plain letter in Tibetan, followed by tsheg (the dot that you render with SPACE in romanization). 2) S is never subscribed in Tibetan (it can only be superscribed) and whenever there is more than one consonant following a vowel in Romanized Tibetan, they are written as separate single consonant letters following each other in Tibetan.

There will, however appear more errors, if we try other letter combinations. I noticed that Aksharamukha transforms Roman "c" into the Tibetan letter that is Romanized as "ts". (This may have been chosen intentionally because Tibetan tends to spell Sanskrit words containing "ca"/"cha" with Tibetan "tsa"/"tsha", but Tibetan does have the sounds equivalent to Sanskrit "ca"/"cha" and we do Romanize them "ca"/"cha".

On the whole, Tibetan is NOT TO BE ROMANIZED AS INDIC and one has to look into the different rules of Tibetan romanization (Wylie). I guess you needed to do adjustments of the procedure for Thai as well....

(By the way the "Lao (Pali)" option is a very good decision, but why not also "Thai (Pali)" - also in Thai certain letter will be transcribed as kh or th when transcribing Thai language, while it is a letter for g or d in Pali!)

There should be option "Roman (Wylie for Tibetan)" among the romanization scripts in the source/target menus.

While it will need some work to develop the Wylie--Tibetan transliteration, what would be absolutely easy and yet is missing is the option to transliterate between the different Tibetan "scripts" (typefaces), notably the "dbus chen" (which you include as "Tibetan") and the "dbus med" which is the common script of manuscripts as well as official posters by the Tibetan Administration and most of foreigner who learned Tibetan cannot read it.

Another suggestion, a wanted feature, would now be to narrow down the scroll menu of source/target options to only those scripts one would like to appear there. Since you included so many curious scripts, it takes a long time to scroll through the list each time one wants to choose a new target or source (e.g. when one wants to produce an Indic word in several South Asian scripts, and the like, one needs to switch the target each time and then the scrolling through few dozen scripts becomes almost as deterring as the complicated procedure to sign up for GitHub which will make most visiting users give up writing a comment :)

yakcz commented 5 years ago

https://www.thlib.org/reference/transliteration/wyconverter.php

virtualvinodh commented 5 years ago

Thanks for the comments.

The original idea of the converter was to include convert between Sanskrit/Prakrit to Indic-derived scripts basically. But I eventually it was expanded to include a whole lot of other scripts.

I do understand, it doesn't make much sense to transliterate from HK/ISO/IAST/Itrans to Tibetan if you want to write Tibetan texts.

I'll try supporting Wylie as well. Apparently, I can use THL's code as such (since it's free software). So, it should be easy enough to do that.

Thanks for the suggestion.

"Thai (Pali)" - also in Thai certain letter will be transcribed as kh or th when transcribing Thai language, while it is a letter for g or d in Pali!

I assume, Thai in Aksharamukha, is pretty much is Thai (Pali/Sanskrit). I don't support additional Thai characters or the tone marks. Transliterating from Thai languages texts would pretty much give your gibberish.

Unless you have anything particular in mind?

While it will need some work to develop the Wylie--Tibetan transliteration, what would be absolutely easy and yet is missing is the option to transliterate between the different Tibetan "scripts" (typefaces), notably the "dbus chen" (which you include as "Tibetan") and the "dbus med" which is the common script of manuscripts as well as official posters by the Tibetan Administration and most of foreigner who learned Tibetan cannot read it.

This should be totally doable. I just need a 'dbus med' font (preferably, open source) that I include. Do you have any suggestions?

Another suggestion, a wanted feature, would now be to narrow down the scroll menu of source/target options to only those scripts one would like to appear there.

That's good idea. I'll add that.

Since you included so many curious scripts, it takes a long time to scroll through the list each time one wants to choose a new target or source (e.g. when one wants to produce an Indic word in several South Asian scripts, and the like, one needs to switch the target each time and then the scrolling through few

You can always type the script that you want. You don't have to necessarily scroll though. It has a inbuilt search feature :)

dozen scripts becomes almost as deterring as the complicated procedure to sign up for GitHub which will make most visiting users give up writing a comment

Ah! I also respond to emails :) In fact, most people just shoot me an email :)

V

yakcz commented 5 years ago

http://digitaltibetan.org/index.php/Tibetan_Fonts

Dne út 30. 4. 2019 18:27 uživatel Vinodh Rajan notifications@github.com napsal:

Thanks for the comments.

The original idea of the converter was to include convert between Sanskrit/Prakrit to Indic-derived scripts basically. But I eventually it was expanded to include a whole lot of other scripts.

I do understand, it doesn't make much sense to transliterate from HK/ISO/IAST/Itrans to Tibetan if you want to write Tibetan texts.

I'll try supporting Wylie as well. Apparently, I can use THL's code as such (since it's free software). So, it should be easy enough to do that.

Thanks for the suggestion.

"Thai (Pali)" - also in Thai certain letter will be transcribed as kh or th when transcribing Thai language, while it is a letter for g or d in Pali!

I assume, Thai in Aksharamukha, is pretty much is Thai (Pali/Sanskrit). I don't support additional Thai characters or the tone marks. Transliterating from Thai languages texts would pretty much give your gibberish.

Unless you have anything particular in mind?

While it will need some work to develop the Wylie--Tibetan transliteration, what would be absolutely easy and yet is missing is the option to transliterate between the different Tibetan "scripts" (typefaces), notably the "dbus chen" (which you include as "Tibetan") and the "dbus med" which is the common script of manuscripts as well as official posters by the Tibetan Administration and most of foreigner who learned Tibetan cannot read it.

This should be totally doable. I just need a 'dbus med' font (preferably, open source) that I include. Do you have any suggestions?

Another suggestion, a wanted feature, would now be to narrow down the scroll menu of source/target options to only those scripts one would like to appear there.

That's good idea. I'll add that.

Since you included so many curious scripts, it takes a long time to scroll through the list each time one wants to choose a new target or source (e.g. when one wants to produce an Indic word in several South Asian scripts, and the like, one needs to switch the target each time and then the scrolling through few

You can always type the script that you want. You don't have to necessarily scroll though. It has a inbuilt search feature :)

dozen scripts becomes almost as deterring as the complicated procedure to sign up for GitHub which will make most visiting users give up writing a comment

Ah! I also respond to emails :) In fact, most people just shoot me an email :)

V

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/virtualvinodh/aksharamukha/issues/31#issuecomment-488020675, or mute the thread https://github.com/notifications/unsubscribe-auth/AL5O5MTJFG2PGWZP4LR7EKLPTBXNJANCNFSM4HIREDAQ .

virtualvinodh commented 5 years ago

I have added support for Dbu-Med now.

I will add support for Wylie as well shortly.

V

yakcz commented 5 years ago

I cannot see that there would now be an option to choose between dbuchen and dbumed in the transcriber. Perhaps you mean that you added just the software's ability to read dbumed?

Now I looked at this page of yours: http://aksharamukha.appspot.com/#/describe/Tibetan

and that's where it clearly shows the problem I was trying to describe. We cannot transcribe a Tibetan text by an algorythm similar to the one we need for indic texts. Just few characteristics, to outline, partially, the situation 1) INHERENT "A" PRINCIPLES DIFFER CONSIDERABLY FROM INDIC. ROUGHLY: THERE IS NOT AN INHERENT "A" UNDER EVERY CONSONANT. There is MAXIMUM ONE INHERENT "A" BETWEEN THE TWO TSHAGS (MEANING THIS DOT: ་). 2) IN TIBETAN LANGUAGE, LETTERS ཅ་ཆ་ཇ་ REPRESENT WHAT WE TRANSCRIBE "c, ch, j", WHILE LETTERS ཙ་ཚ་ཛ་ REPRESENT THE SOUNDS TRANSCRIBED AS "ts, tsh, dz". HOWEVER, IN SANSKRIT WORDS AND MANTRAS, THE TIBETANS FROM EARLY ON ASSIGNED (ERRONEOUSLY) THE LETTERS ཙ་ཚ་ཛ་ TO THE SANSKRIT "c, ch, j". (AS A RESULT, THEY WRITE AND PRONOUNCE THE SANSKRIT CA AS TSA, ALTHOUGH THEY HAVE THE SOUNDS - AND LETTERS - FOR CA). YOU CURRENT ALGORYTHM TRANSCRIBES SANSKRIT INTO TIBETAN LETTERS, AS TIBETANS WOULD WRITE IT AND CAN ONLY TRANSCRIBE SANSKRIT WORDS FROM TIBETAN TEXT. ANY TIBETAN TEXT IS RENDERED TOTALLY WRONG. THEREFORE, I WOULD SUGGEST TO RENAME "TIBETAN" TO "SANSKRIT IN TIBETAN SKRIPT".

AND, ON THE PAGE OF DESCRIPTION OF "TIBETAN" THE TRANSCRIPTION GIVEN FOR THE SAMPLE TIBETAN TEXT IS COMPLETELLY DISSIMILAR TO ANY KIND OF TRANSLITERATION OR TRANSCRIPTION OF TIBETAN. SO GIVEN THE CAPACITY OF AKSHARAMUKHA, IT WOULD MAKE SENSE TO RATHER GIVE A SAMPLE OF SANSKRIT IN TIBETAN LETTERS THERE

I am not sure if I managed to make it clear.

As an example, lets také the first line of the sample text given at http://aksharamukha.appspot.com/#/describe/Tibetan

མཆོག་གླིང་ཐུགས་སྒྲུབ་ལོངས་སྐུ་ངན་སོང་དོང་སྤྲུག་གཙོ་འཁོར་ཡོངས་རྫོགས་སྔགས་བྱང་ནི། machoga gliṅa thugasa sgruba loṅasa sku ṅana soṅa doṅa spruga gaco akhora yoṅasa rjogasa sṅagasa byaṅa ni mchog gliṅ thugs sgrub loṅs sku ṅan soṅ doṅ sprug gtso 'khor yoṅs rjogs sṅags byaṅ ni

(Where the Wylie transliteration would have "ng" in place of your "ṅ") Apart from the redundant "a"s that render it unintelligible to anyone, there is serious problem with transcribing the འ as "a". When it does not bear a vowel "matra", it does not represent a vowel! (only when subscribed to consonants in Sanskrit words, but then it does not stand for "a", but for macron over the given vowel). This letter is then transliterated by apostrophy in Wylie and in alternative transliteration systems (used by those, who'd transcribe "ṅ" and not "ng") by a different consonat letter (usually underlined "h", or even "h" with a dot under, or in case of Beyer in his Classical Tibetan it is transcribed as "N").

All summed up, I'd suggest to put in big letters "Aksharamukha cannot transliterate the Tibetan language, it only renders Sanskrit words written with the Tibetan letters."

What I was suggesting about dbumed was that it could easily serve for transliteration between Tibetan-in-dbuchen and Tibetan-in-dbumed, as no algorythm is needed whatsoever.

Best. Yak

On Mon, May 6, 2019 at 12:39 AM Vinodh Rajan notifications@github.com wrote:

I have added support for Dbu-Med now.

I will add support for Wylie as well shortly.

V

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/virtualvinodh/aksharamukha/issues/31#issuecomment-489469803, or mute the thread https://github.com/notifications/unsubscribe-auth/AL5O5MVQZIKNNJDVRHMKZZDPT5OY3ANCNFSM4HIREDAQ .

--

=========================================== KONTAKTY (non-Czech-speakers scroll down!!)

+420 608 570 587

Ksichtokniha (pro česky rozumějící): Jak Čejka https://www.facebook.com/yak.cesky (Starší profil Yak Cejka mám už jen pro komunikaci se zahraničím)

info o mě a mém domu pro návštěvníky: www.BeWelcome.org/members/Yak_Czechia/cs https://www.bewelcome.org/members/Yak_Czechia/cs

=============================== CONTACTS

mobile phone: +420 608 570 587

Facebook/Messenger for my international connections: Yak Cejka http://www.facebook.com/yak.czech (if you send a friend-request, pls. write a message describing who you are!!)

About me and my house - for potential guests and hosts: www.bewelcome.org/members/Yak_Czechia www.couchsurfing.org/people/Yak_Czechia (BeWelcome is a much better site than Couchsurfing, recommended!)