Open thesnarky1 opened 10 years ago
How does this relate to line breaking? I suppose that contextually-replaced characters only apply to the case when two (or more?) related glyphs are not broken-within, right?
I can see two answers: The basic answer is "yes, it only replaces characters within strings of the same glyphs". So if you were to call this and had a newline inserted in the middle ofa word, as soon as it got to the new line the last letter before the newline would be in FINAL form and the first letter after the newline would be in INITIAL form.
The other way to answer this is that I hadn't considered what happened if drawText was used with a max width after contextually replacing the characters. That would end up in a situation where you (potentially) have a MEDIAL form letter at the end of the line and a MEDIAL form letter starting the line. It would be ugly. However, that would also make it good and clear that a linebreak occurred, instead of having a completely new (really short) word appearing later.
ROT works nicely for languages that are written left-to-right in a non-connecting script. It handles Unicode just fine (as long as the browser does) so that covers a lot of ground. However, two areas that I believe would open this up to a much broader global audience would be support for a native right-to-left rendering and the ability to nicely display characters from languages that use cursive scripts (such as Arabic, Farsi, Hindi, etc). This ticket is for cursive scripts to keep the issues able to be separately pulled.
Arabic is my example below, but it holds for any cursive script wherein the letters change based on position(for instance an 'a' at the start of a word appears differently than an 'a' at the end).
Issue
The crux of this issue is that while a browser correctly knows how to display a string of Unicode characters while in the context of that string, pulling each character out to stand on its own breaks the connectivity. Currently
ROT.Display.drawText
prints each character on its own, leading to only the isolated form of that character being printed, regardless of the context it was pulled from.To illustrate the point, in the code below you can see how the characters join together
However, after running the code, we see this:
The middle row is how they'd look connected, the bottom row is the issue because it used drawText. This occurs because the Unicode characters being used and displayed nicely in a browser are from the "Arabic" Unicode range (0600–06FF) and the browser is correctly interpreting it in context. Because
ROT.Display.drawText
grabs each character separately one needs to replace it with the proper contextual character from the Arabic Presentation Forms B range (FE70–FEFF).As an example of how to fix this, I wrote up a quick Arabic lookup table [https://gist.github.com/thesnarky1/10012004] that takes standard Unicode and checks the context to replace the character. In this case
getRealCharCodes
checks the string for anything it can translate, then replaces those characters based on the context it sees them in.Even without understanding an Arabic alphabet, you can see that the characters on the bottom line are now changed to connect to something (even if right-to-left is still not working).
Proposed Solution
One hack to bypass this is to call
ROT.Display.draw()
and provide a string instead of a character. This prints the entire string as one unit which correctly connects the letters. Unfortunately this also negates the ability to do any sort of intelligent padding because spacing is not considered. This is really not a solution, just a work-around.This is something that could be left up to each developer to figure out, however I disagree with that approach because I believe one's native language should have a low-barrier for entry to programming.
I would propose allowing for a community-built approach wherein ROT provides a basic framework to do this character substitution, provided the community provides the appropriate translations. Essentially it would amount to adding two new variables in ROT.Text (for instance
ROT.Text.cursiveCharactersToTranslate
andROT.Text.cursiveCharactersWhichDontConnect
), adding a method to ROT.Text that would perform the lookup, and then allowing anyone to put language packs into the addon directory.Each language pack would consist of additions to that variable indexed by the base Unicode letter (so there should be no collisions), as well as additions to the list of characters that don't connect. For example, Arabic could be a small Javascript file containing:
This would keep the burden of maintenance of specific languages on those who actually know the language, while providing a very key improvement to ROT overall. It would also stay small because if someone did not use any special languages, the only additional size to their ROT would be the function in ROT.Text.
I'm more than happy to draft up a pull request for this, wanted to check interest before I did, however. Also wanted to debate whether this is better as part of
ROT.Text
orROT.Display
adding it intoROT.Display.DrawText
as an additional boolean variable.Impact
As for why I believe this is important, two of the top five most spoken languages in the world (Arabic and Hindi) are cursive (estimated around 700 million people)(http://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers). Allowing for this support would enable much closer to native development (outside of Javascript using English syntax) for a swath of the world stretching from Morocco to India, along with expat communities world-wide that desire to hold on to their heritage.