w3c / sealreq

Southeast Asian layout task force
34 stars 6 forks source link

Goal for requirements for Indonesian scripts #10

Open NorbertLindenberg opened 6 years ago

NorbertLindenberg commented 6 years ago

Indonesian scripts differ from the Thai, Khmer, Lao, and Myanmar scripts in that they’re not much used in daily life anymore. Since roughly the 1930es, the Indonesian languages are generally written in the Latin script. Several of the traditional scripts are still taught in school, occasionally used on building or street signs, and obviously used to transcribe old documents. I don’t know to what extent they’re used for new writing.

However, conventions in the use of the traditional scripts seem to be shifting somewhat. At least on building signs, it seems common to insert space between words, and to break lines only at word boundaries. I don’t know whether these changes also occur in long-form writing, and what standards or school textbooks say about them. Does anybody have more information? In particular, what is common usage when transcribing old documents?

If it turns out that there are divergent modern and traditional conventions, which ones should become the basis for the W3C requirements, taking into consideration the prevalence of new and old writing in the individual scripts?

adtbayuperdana commented 6 years ago

It is actually quite common for building and road signs to be done by a third party who doesn't understand the script at all. Factoring the widespread misunderstanding that qwerty keyboards are perfect transliteration scheme for Indonesian scripts, most of the these public samples is definitely wrong and should not be used as a base for transcribing. Cultural and educational figures know that these texts are wrong, local government are usually too busy with bureaucracy to fix it, and third party and (unfortunately) many regular people are either too apathetic to care or are simply not well-versed enough to know that it is wrong. In my opinion, we should be very careful in looking at the vernacular usage of Indonesian script, since the education system has not been quite successful in educating the people about the traditional scripts.

Standards are of course different from script to script. Javanese has broadly two standard: Mardi Kawi (MK) used for transcribing Kawi/Old Javanese language that are heavily influenced by Sanskrit, and Sriwedari (SW) which is based on a standardized and relatively recent or modern form of Javanese. Contemporary teachings (based on a standard issued by the regional governments of Java in 2002) seem to be largely based on SW conventions, but online community that are active in using Javanese prefers MK because they are able to transcribe older Javanese literature more accurately. Many contemporary vernacular usage however disregards both of these standards and simply view Javanese in the inaccurate lens of a latin alphabet transliteration, most people doesn't even understand the concept of Javanese as an abugida because they only see very simplistic letter tables that doesn't show the nuance of the script.

As far as I can tell, Balinese writing also has a standard that is closer to that of Javanese MK. Since Lontar documents are still being rewritten, and most lontar contain Old Javanese and Sanskrit terms, it makes sense for Balinese standards to gravitates towards a more conservative form of the spelling. Transcribing old documents seems to be more prevalent in Balinese compared to Javanese which often used to write short phrases or sentences, at least in the popular, non-academic realm.

Sundanese... is a bit weird in my opinion. Current Sundanese script is a an attempt by the Western Javanese government to revitalize ancient Sundanese (which nobody has used for centuries as Sundanese is written with a succession of scripts from ancient sundanese, to javanese, arabic and latin as time goes) and in a way to reclaim their "identity" or such. But, the committee did a really weird thing by arbitrarily simplifying many glyph shapes and introducing letters that nobody uses like Q, X, V, and Z; which indicates that "latin misunderstanding" is also present in Sundanese standard. As far as I know, there are no long text or book that are written with modern Sundanese (otherwise known as Aksara Sunda Baku, or Standardized Sundanese). It is the version that are taught at schools but I think Ilham Nurwansyah mentioned to me that the simplification is really unfortunate. For transcribing ancient sundanese document, it is possible to use existing code points (plus sundanese supplement) and just change all of the Sunda Baku glyph shapes into a more historically accurate Ancient Sundanese (Sunda Kuno).

In short, samples of "new" writing should be taken with a lot of salt since many of them are fueled by misconceptions of the nature of Indonesian writings. In my opinion, examples should be take from medias that are made at the time the script in question is used functionally and not just as a cultural token. For Javanese, colonial era publications might be appropriate. For Balinese, lontar manuscripts; and so on.

adtbayuperdana commented 6 years ago

I think I have the pdf file for Mardi Kawi, not sure about Sriwedari standards, but I have a 2002 official Javanese script standards (of sorts) issued by the government. Do I need to post it here? If I'm not mistaken, Setya Amrih made a overall review document of early to modern Javanese orthography, but the document is in Bahasa

NorbertLindenberg commented 6 years ago

If copyright law or licenses allow you to post such documents here, it would be great to have them. Otherwise, references to them and information on where to acquire them will have to suffice.

We will by necessity have to rely on documents in Bahasa Indonesia, other Indonesian languages, Dutch, or others, and on speakers of these languages to translate or interpret relevant sections.

adtbayuperdana commented 6 years ago

A general overview of the Javanese script orthography from era to era Setya Amrih (-) Kelengkapan Aksara Jawa.pdf

The Mardi Kawi standard, used by Dutch Indies scholars to transcribe Old Javanese literature. Some language purists and the Javanese script online community tends to prefer this standard. Mardi Kawi vol 1.pdf Mardi Kawi vol 2.pdf Mardi Kawi vol 3.pdf

A somewhat contemporary standard issued by the local government of SR Yogyakarta, Central Java, and East Java. It is most likely the standard to which students are introduced to in Javanese language classes throughout the island. Pemda (2002) Pedoman Penulisan Aksara Jawa.pdf

Ilhamkang commented 6 years ago

As I have said before to @mangajapa, the standard Sundanese script simplification is an unfortunate. As far as I know there is no published book fully written with standard Sundanese script, except there are several "how to write/read" books with rather long texts.

Unicode provides Sundanese script slots for standard and/or historical Sundanese script. This is an option while creating glyph shapes. But in fact the historical glyphs appear in the Noto Sans Sundanese font which has made confusion to be used along with standard character. While in the formal schools, the historical glyphs are not taught to the students.

There are typical differences between old Sundanese and standard Sundanese script, which does not seem to be compromised. I would ideally recommend to separate the use of standard and historical glyphs to a different typing system which will affect the appearance on the platforms.

I would notice that what we are facing now is the use of standard Sundanese script, not the historical Sundanese script. The USE on standard Sundanese still has small problem with repha, which I've discussed here https://github.com/googlei18n/noto-fonts/issues/1121. It has to be corrected as soon as possible before we can work on next steps.