sspanak / tt9

A T9 keyboard for Android devices with a hardware keypad.
Apache License 2.0
254 stars 43 forks source link

Thai language support #628

Closed mmmmmob closed 2 months ago

mmmmmob commented 2 months ago

I just finished adding a Thai language pattern on keypad and 5,000 words of Thai language. However when typing in Thai we don't usually need an automatically added space (since our language don't need that) I'm not sure if you can find a way to change that behavior specifically for Thai?

sspanak commented 2 months ago

Hi and thank you very much for your contribution.

For the time being, you can turn off Automatic Space from Settings -> Keypad. In the future, I am planning to add support for other Asian languages, so I will make the necessary changes to add spaces only for the languages that need them.

I reviewed the pull request. It looks OK from technical perspective, but I am not sure if a dictionary this small will be useful. For example, AnySoftKeyboard has a dictionary of 17k words, another word list on Github contains 19k words and the LibreOffice dictionary has 53k base words, not counting the possible combinations with prefixes and suffixes. Could you please check if these word lists can be used to upgrade yours?

If Thai commonly used words are about 50000 or even about 20000, then using a dictionary of 5000 will result in bad typing experience. You will frequently find many words are missing. However, I may be wrong, because I don't know how the language works. If it is similar to Vietnamese, where there is actually a small number of base words which are combined to form longer ones, then it would work.

I have no way of testing how well typing works, because I don't understand anything. It is up to you to confirm the best choice of dictionary.

mmmmmob commented 2 months ago

Hi! Thank you for the review. I understand your concerns on the amount of words comparing to other sources you mentioned.

Up to the time that I try to find word lists, I only found this 5,000 words from Thai National Corpus (a project from Faculty of Arts, Chulalongkorn University) which has the frequency of these words. However, I'll find other sources that might help contribute more to the dictionary.

In the meantime, apart from your final build release - are there any ways I can try these csv list on my device or emulator on Android Studio as a test? When I built it for testing I can't find ways to try typing on,

Thank you in advance and hope to help you contribute to this project!

sspanak commented 2 months ago

Up to the time that I try to find word lists, I only found this 5,000 words from Thai National Corpus (a project from Faculty of Arts, Chulalongkorn University) which has the frequency of these words. However, I'll find other sources that might help contribute more to the dictionary.

This sounds like a very good start. It would be great if you could find a bigger word list, and if it is spell-checked, and it doesn't contain nonsense words. I mostly do not recommend lists generated from subtitles or news. I would also be grateful if you checked the word lists I posted above. No matter what, I really don't think 5k words are enough.

Also, don't worry about the word frequencies, they are not so important. Even if only 15% of the words have frequencies assigned, this is more than enough to get you quite accurate suggestions. And it is OK if the frequencies are completely missing. And we already have that.

In the meantime, apart from your final build release - are there any ways I can try these csv list on my device or emulator on Android Studio as a test? When I built it for testing I can't find ways to try typing on,

Oh, I was under the impression you have already tested it. Sure, you can do it both in the Android Studio emulator and your own device. Actually, you ought to. No code makes it in the final release until it is properly tested.

What do you mean you can't find ways to try typing? Have you followed the initial setup? Also, have in mind that every time you run TT9 from Android Studio, you must switch from Gboard to TT9. Android resets the default keyboard when it is being re-installed. Let me know if you still have trouble.

Now, back to testing, I have tried out Thai, but I do believe it needs some more work before releasing. See the screenshot from the emulator (in touchscreen mode): thai

  1. There are too many letters per key. Naturally, it is not possible to visualize them all. What should we do in this case? Is it better to display the first 3-4 (currently, the maximum is 5, but we can change it as necessary)? Or maybe displaying some specific letters per key would make it more clear to the user?
  2. The combining characters (the vowels?) on the 8-key are visualized stacked. This is really weird and it needs to be fixed. Shall we display one single character or icon, or maybe some label saying "vowels"? Or is there a better solution?
  3. Do the characters on the 9-key look correctly? Do we need to do something about them too?

Thank you in advance and hope to help you contribute to this project!

It is exactly the help I need. I must thank you!

mmmmmob commented 2 months ago

What do you mean you can't find ways to try typing? Have you followed the initial setup?

I just saw Thai language in Language download page on the app lol sorry I just didn't check it properly. It works fine on my Android Studio emulator now. As far as I test for the typing experience and word suggestions -- it works fine! Just need to add more word pool as you suggested. I'll have a look on your provided links in the first comment and add it to my next commit.

As per your question and suggestion, let me share some of my thoughts:

  1. There are too many letters per key. Naturally, it is not possible to visualize them all. What should we do in this case? Is it better to display the first 3-4 (currently, the maximum is 5, but we can change it as necessary)? Or maybe displaying some specific letters per key would make it more clear to the user?

Yes! Actually in the original Thai T9 key layout we only indicate the first and last character on the key.

For example, key 2 consists of 'ก, ข, ฃ, ค, ฅ, ฆ, ง, จ, ฉ', we can show 'ก - ฉ' on the key because Thai users know the order of alphabet from first to last just like A-Z.

In this case, I wonder if this has to be coded from my end or you can help adjusted it?

170305_06

  1. The combining characters (the vowels?) on the 8-key are visualized stacked. This is really weird and it needs to be fixed. Shall we display one single character or icon, or maybe some label saying "vowels"? Or is there a better solution?

  2. Do the characters on the 9-key look correctly? Do we need to do something about them too?

There's some hurdle on putting all alphabet based on the original key layout which using 1 and 0 button. I decided to put the tonal and vowel characters into 8 and 9 key so it might be jam-packed and displayed incorrectly. There are two ways I think might be suitable for this particular problem:

  1. Like the alphabet case, we can show the first and last vowels of each key like key 8 : ' ิ , ี, ึ, ื, ุ, ู, ั, ่, ้, ๊, ๋, ็, ์' to ' ิ - ์' this way, users can imply that this key consists of tonal and vowel characters that go in the upper of normal character. In the same way with key 9 (tonal and vowel chars that go in front or behind normal character)

  2. From your suggested idea, we might use the word like 'สระบน' (upper vowels) on key 8 and 'สระหน้า' (front vowels) on key 9

But from my personal experience using Thai T9 in the past -- I'd be nice if we can achieve with first solution. Let me know what you think and might be best for both UX and UI.

sspanak commented 2 months ago

I just saw Thai language in Language download page on the app lol sorry I just didn't check it properly.

There may be one more thing that I forgot. I've setup the language cache to be a bit more aggressive, because I don't want to wait 1 minute to validate all languages every time. In case you don't see your changes in the .yml or the dictionary, make sure to use Clean Project.

Yes! Actually in the original Thai T9 key layout we only indicate the first and last character on the key.

For example, key 2 consists of 'ก, ข, ฃ, ค, ฅ, ฆ, ง, จ, ฉ', we can show 'ก - ฉ' on the key because Thai users know the order of alphabet from first to last just like A-Z.

In this case, I wonder if this has to be coded from my end or you can help adjusted it?

The same problem is going to occur when I get to Hindi. I would like to think a bit and structure the code in a more universal way appropriate for any language that has many letters per key. But it should be easy to do, I think.

There's some hurdle on putting all alphabet based on the original key layout which using 1 and 0 button.

Unfortunately, it is not possible to put letters on the 0-key. And if they are on the 1-key, it will work most of the time, but there may be weird side effects. I am yet to explore how to do this when I get to Korean.

I decided to put the tonal and vowel characters into 8 and 9 key so it might be jam-packed and displayed incorrectly. There are two ways I think might be suitable for this particular problem: ...

Displaying the first and the last should be easier from technical perspective, and if it is the more intuitive choice from user perspective, then it's even better. Since I am going to do it for the keys from 2 to 7, it should automatically work for 8 and 9, too. I suppose, we can avoid stacking by just adding a space between each vowel/tonal char and the hyphen. There is enough room on the keys, so everything should fit nicely.

And if option 1 doesn't look good or it doesn't work well, we will use option 2. 'สระบน' and 'สระหน้า' are short enough and will look good on the keys.

mmmmmob commented 2 months ago

That's some great news!

In this case while you're figuring out ways to improve the code to put many-characters languages like Thai and Hindi (and etc. in the future) I'll then try to finish the larger dictionary database and make a commit later.

If you're having any problems regarding Thai, please do not hesitate to comment on this issues so I can try helping you with whatever I can. Thanks a lot! 🙏🏻

sspanak commented 2 months ago

@mmmmmob, I have found a problem with some of the characters on the 9-key, namely , ฦๅ, ฤๅ. They are individual letters, but in practice, consist of two separate Unicode characters. But currently, it is not possible to represent more than on character per key press. This means typing words like: น้ำ will require "489", instead of the expected "49", and ฤๅทัย would be "99486", instead of "9486".

And in order, to avoid total confusion, I suggest that we remove them from the list. This way, I hope, people will figure out they have to combine the available letters to get the missing ones.

Is the above acceptable? What do you think?

mmmmmob commented 2 months ago

@sspanak I understand the issues. Based on the frequency of these three characters, I suggest keeping in the list. This vowel character is quite important and frequently used. For example, in the word น้ำ, we expect to type '489' because it consists of (4) + (8) + (9) in terms of spelling. I have also tested other words that use, and it works as expected.

As for the other two characters, I think we can remove them. They seem to be special characters that aren’t used much lately. If users can’t get a word suggestion for these characters, they can switch to normal typing, which shouldn’t be a big hassle.

sspanak commented 2 months ago

Alright, I got it wrong. I just removed ฦๅ and ฤๅ. Typing shouldn't be affected.

By the way, I am done and I have pushed to your branch. Could you please test if everything is OK for the final time? I want to make sure the key labels look OK and typing still feels fine. Just make sure to reload the dictionary before testing.

mmmmmob commented 2 months ago

Everything looks great! Tested it and haven't seen any problems at all. Also all keys are displaying correct layout. Thank you for your hard work! @sspanak :D

Screenshot_2024-09-16_23-05-50

sspanak commented 2 months ago

I forgot to take care of the automatic space, but I did it today. Now I am completely done.

Thank you for helping make TT9 better! I could have not done it myself. It was a really enjoyable collaboration. Stay awesome and enjoy using Thai in TT9.

mmmmmob commented 2 months ago

Likewise! Thanks a lot to you too. ขอบคุณครับ 😄🙏🏻