olikraus / U8g2_for_Adafruit_GFX

Add U8g2 fonts to any Adafruit GFX based graphics library.
Other
103 stars 32 forks source link

Problem with "u8g2_font_unifont_t_bengali" #33

Open muhit313 opened 2 years ago

muhit313 commented 2 years ago

Thanks for your big library support. But I had some problem with "u8g2_font_unifont_t_bengali" this font. I want to show Bengali word "আমার" on my ST7735 display. But it show me the output আম ার and want to show Bengali word "মুহিত" on my ST7735 display but it show me the output ম ুহ িত . I can't understand where is the wrong ? How can I solved it?

#include <Adafruit_GFX.h>
#include <Adafruit_ST7735.h>
#include <SPI.h>
#include <U8g2_for_Adafruit_GFX.h>

#define TFT_CS   8
#define TFT_RST  7
#define TFT_DC   6

Adafruit_ST7735 tft = Adafruit_ST7735(TFT_CS, TFT_DC, TFT_RST);
U8G2_FOR_ADAFRUIT_GFX u8g2_for_adafruit_gfx;

void setup(void) {
  tft.initR(INITR_BLACKTAB);      // Init ST7735S chip, black tab
  tft.setRotation(0);
  tft.fillScreen(ST77XX_BLACK);
  u8g2_for_adafruit_gfx.begin(tft);
}

void loop() {
  u8g2_for_adafruit_gfx.setFontMode(0);                 
  u8g2_for_adafruit_gfx.setFontDirection(0);
  u8g2_for_adafruit_gfx.setForegroundColor(ST77XX_YELLOW);
  u8g2_for_adafruit_gfx.setFont(u8g2_font_unifont_t_bengali);
  u8g2_for_adafruit_gfx.setCursor(20, 20);
  u8g2_for_adafruit_gfx.print("আমার"); 
}
ZinggJM commented 2 years ago

I have just discovered this article. Seems to be related to our issue. I have not yet read it in detail, nor understood it. https://docs.microsoft.com/en-us/typography/script-development/bengali

I think only a writer fluent in writing bengali (bangla) can judge if implementing a subset of all these features would be useful at all.

Yet another link: https://www.mediawiki.org/wiki/Language_tools/Requirements/Indic_language_support#Language_support_status And some fonts: https://www.omicronlab.com/bangla-fonts.html

olikraus commented 2 years ago

@ZinggJM nice links, i think this infact shows the main problem:

Lets take this example: ক+্+ক=ক্ক I think it also was mentioned in the microsoft document. Hex codes are this: 0x995 0x9CD 0x995= 0x995 0x9CD 0x995

The combinded char ক্ক seems to be called conjunct glyph. To me it seems to be impossible to draw this glyph by using ক twice (the same unicode 0x995 is used twice above). The resulting sign ক্ক seems to look very different but the bitmap for this "conjunct sign" is not stored in unicde. At least I have not found the unicode for Bengali conjunct signs. As a result: We can not search for 0x995 0x9CD 0x995 and replace this sequence with a new bitmap (simply because the bitmap does not exist).

@muhit313 in other words, I can not proceed here, because I have no idea how to get ক্ক Let me know if you found the unicode for the kaka conjunt ক্ক . I tried for 20 min without success.

ZinggJM commented 2 years ago

@muhit313

How many people live in India? How many of them know Arduino and Arduino displays?

I would expect there are forums in your language, and some of them dealing with presenting bangla fonts on cheap displays.

These forums are not accessible for us, as we can't read them. But I expect you can.

olikraus commented 2 years ago

Here is another article (actually FAQ); https://unicode.org/faq/ligature_digraph.html I think it explains why this problem can not be solved in u8g2. Assuming that ক্ক is a ligature according to the definition from https://unicode.org/faq/ligature_digraph.html "In a ligature, the glyphs are fused into a single glyph."

As a consequence the following statement is there: "Ligaturing is a behavior encoded in fonts: if a modern font is asked to display “h” followed by “r”, and the font has an “hr” ligature in it, it can display the ligature. Some fonts have no ligatures, while others (especially fonts for non-Latin scripts) have hundreds of ligatures. It does not make sense to assign Unicode code points to all these font-specific possibilities."

So it looks like that ক্ক is not included in unicode (and not in unifont). Hence it can not be used and displayed by u8g2.

muhit313 commented 2 years ago

@ZinggJM

How many people live in India? How many of them know Arduino and Arduino displays?

First of all I from Bangladesh, not from India. In Bangladesh there are small amount of people knows Arduino and Displays. Specially the problem we faced is very difficult for them. Besides, as i know there are no Bengali forum or community or expert in Bengali language. If exist they are not expert to solved this issue.

muhit313 commented 2 years ago

I think it explains why this problem can not be solved in u8g2.

@olikraus Oh! This is a bad news for me. Because I have already done more than half of my project using your library. I don't know how can I overcome this. All Bengali Conjunct font is here https://maasarada.blogspot.com/2015/12/bengali-conjunct.html .

They https://blog.adafruit.com/2019/04/08/the-adafruit-gfx-library-now-supports-unicode-adafruit-adafruit-josecastillo/ claimed that all Unicode are supported by them. But is it true? Is it working perfectly?

May be there are no way and I need to closed this issue without any hope!

ZinggJM commented 2 years ago

@muhit313

Thank you for the clarification, and please forgive my ignorance. This partly explains my confusion with these - for me strange - glyphs and symbols. But I still would expect there are knowledgeable people in this field of your language.

muhit313 commented 2 years ago

@ZinggJM

Thank you for the clarification, and please forgive my ignorance. This partly explains my confusion with these - for me strange - glyphs and symbols. But I still would expect there are knowledgeable people in this field of your language.

Its ok. I know its so much difficult for both you and olikraus to find out the solution. Because Bengali is a strange glyphs and symbols for both you and olikraus .

But I think olikraus try to his best to solve this issue. I am satisfied with his help and support.

Thanks to both you and olikraus again.

olikraus commented 2 years ago

They https://blog.adafruit.com/2019/04/08/the-adafruit-gfx-library-now-supports-unicode-adafruit-adafruit-josecastillo/ claimed that all Unicode are supported by them. But is it true? Is it working perfectly?

U8g2 also supports all unicode. But the issue is, that there are no unicode glyphs for the conjunct chars. So also the adafruit lib will (probably) fail with your language.

All Bengali Conjunct font is here https://maasarada.blogspot.com/2015/12/bengali-conjunct.html .

Indeed there would be some hope if all conjunct glyphs are part of a font. The above page for example does not name an unicode (because they do not exist), but it would be possible to create a font with those conjunct chars and display them with u8g2.

What I mean is this: Create you own bitmap font with all Bengali chars and the above conjunct glyphs. You could use Fony.exe but this will be a huge amount of work. On the other side you probably would become a well known embedded developer because such a font with all Bengali single and conjunct glyphs does not exist.

ZinggJM commented 2 years ago

Heutzutage wird Bengalisch hauptsächlich in den Bundesstaaten Westbengalen, Tripura und in Teilen von Assam sowie auf den Andamanen und Nikobaren gesprochen. Allein in Indien sprechen mehr als 80 Mio. Menschen Bengali.

from https://www.superprof.de/blog/sprache-diaekt-bengali-indien/

80 million Bengali speakers in India. Maybe there are some Arduino users among them.

muhit313 commented 2 years ago

80 million Bengali speakers in India. Maybe there are some Arduino users among them.

Yes there are some Arduino users among Indian people. But they are not expert to do this kind of works... Its a complicated task.

muhit313 commented 2 years ago

@olikraus @ZinggJM Finally I decided to close this issue. Also decided to complete my task using bitmap. Means I write all needed string in a white text editor and take a screenshot, convert it into bitmap image then show it in the display. I know this is bad practice . But no way or not enough time for the Conjunct font to research. Because this is not my main project. This is only a part of my project. In the ending I again thanks to both of you especially @olikraus .

olikraus commented 2 years ago

I have learned a lot about Bengali language. I also understand, that "u8g2_font_unifont_t_bengali" doesn't make sense without conjunct glyphs. I personally do not understand why the unifont.org didn't include Bengali conjunct chars into unicode map.

muhit313 commented 2 years ago

I have learned a lot about Bengali language.

Really? Its a good news!

I also understand, that "u8g2_font_unifont_t_bengali" doesn't make sense without conjunct glyphs.

Exactly true. But its really difficult for you to add conjunct glyphs in "u8g2_font_unifont_t_bengali" as your not Bengali speaker.

I personally do not understand why the unifont.org didn't include Bengali conjunct chars into unicode map.

They don't include Bengali conjunct chars in unicode.org because all Bengali conjunct chars are made from normal char so Bengali conjunct chars don't have Unicode.

http://unicode.org/L2/L2003/03247-bengali.pdf

researchgate

ZinggJM commented 2 years ago

I don't know if TouchGFX from STMicroelectronics supports Bengali, and if so, how they do it. But as this is for embedded systems, although for powerful processors, it might be interesting to take a look at it. I keep getting e-mails about updates, but I had no time to take a closer look. And not for now. https://www.st.com/en/embedded-software/x-cube-touchgfx?ecmp=tt24243_gl_enews_nov2021#get-software

Supports transparency, alpha-blending, anti-aliased fonts and kerning

muhit313 commented 2 years ago

I don't know if TouchGFX from STMicroelectronics supports Bengali, and if so, how they do it.

@ZinggJM If it works then its a good news. But problem is I use a microcontroller with 128kb flash and 4kb of ram which is not so much expensive( Not more then 4$ ). But TouchGFX may be much expensive than that. It is so important that I need to complete my project at low cost as it can. So I select Atmega128 which is a low cost microcontroller.

muhit313 commented 2 years ago

20211205_223247 @olikraus This is an old button phone. And here all Bengali conjunct chars are displayed easily when I select Bengali Language. How they did this! As they did it, so may be there are some way to do this. But the way is not known by me and may be little bit difficult too.

Finally I have got a document which may benefited for us. Because here they discuss about Mobile Messaging Using Bangla

Link: http://dspace.bracu.ac.bd/xmlui/bitstream/handle/10361/431/Mobile%20messaging%20using%20Bangla.pdf

I will read the whole document as soon as possible and give feedback about it.

olikraus commented 2 years ago

Even your old button phone is much much more powerful than an AVR system. The above document refers to symbian OS, which is > 100MB in size and includes a full features true type font render engine. As we already discussed: Displaying all bengali chars wouldn't be a problem if we could use a true type font render engine. But this is not possible due to the size of that engine and your limited flash memory in the AVR.

I just checked the size of libfreetype (=open source true type render engine) on my linux system. libfreetype.a is more than 1MB in size. This is 8 times more than the flash memory (128kb) of your controller.

olikraus commented 2 years ago

The only option I see is this: You create your own bitmap font either with Fony.exe or by extracting the raw glyph information from an existing font. Lets assume the kalpurush font from https://www.omicronlab.com/bangla-fonts.html If you would load this font into a raw glyph viewer (https://opentype.js.org/glyph-inspector.html), then we can see the above discussed ক্ক at position 152 (remember: ক+্+ক=ক্ক ) In fact the kalpurush seems to provde a total of almost 600 bitmaps for all kind of possible combinations of those base bengali chars.

If yes, then what is the minimum size can be done?

This would depend on many factors like the target pixel size of the glyphs and also the question whether you need all the glyph combinations.

olikraus commented 2 years ago

Here is a picture of the extracted raw glyphs from kalpurush font converted to u8g2 (it took me 2h to extract the raw glyph data!). Extracted for -p 16. As you can see, the generated font size is 20kb. Obviously larger -p values would cause larger fonts.

kalpurush

ZinggJM commented 2 years ago

@olikraus

I appreciate your work!

I get e-mails on additions to this issue, as I have been mentioned, and have replied to this issue. Any contribution from me doesn't help, as I only can point to problems ahead. But I want to learn. So I take the occasion to ask some questions.

I have taken a look at many of the Bengali fonts, by double clicking on the .ttf files. In some cases I saw Bengali glyphs presented by the Windows TTF Viewer. In other cases I only saw Latin or Roman glyphs. After I found an old TTF viewer named UnicodeView.exe (25/3/1999), my suspicion that the Windows TTF Viewer presents only the first code range was confirmed. Unfortunately UnicodeView.exe and some other ttf viewers only present fonts from Windows/Fonts. But it shows that even fonts like Arial have a code range for Bengali, and some extensions for combined glyphs.

Now here comes my question: which tools do you recommend for analyzing font files? Are there font converters that can extract from any code range, or even from extensions, e.g. for Adafruit FreeFont format?

Regards, Jean-Marc

olikraus commented 2 years ago

@ZinggJM

The main purpose of a ttf file is to provide a character bitmap for a given unicode (or especially in case of Bengali: for a given sequence of unicodes).

This means: Input: One or more unicodes Output: A bitmap which represents the single or combinded glyph.

Lets look into a simple example: There is one unicode, let's say the captial latin "A" has the unicode code 65. Usually a ttf file contains exactly one picture for this unicode: Input: 65 Output: Bitmap for character "A" Internally TTF has a picture table (glyph table). Each glyph in this table has its own "index" number. So internally the process is like this: Input: 65 Internally: Calculate glyph index for unicode 65 (which could be for example 11) Output: Bitmap for glyph index 11.

Things are more complicated for combined glyphs: Input: Unicode sequence ক+্+ক Internally: Calculate bitmap from ক+্+ক There might be multiple options here: The could be a small program which renders the target bitmap but also it might be just a rule, which maps sequence ক+্+ক to a glyph index. This actually depends on the font. I just noticed, that the above font seems to map the unicode sequence to a single glyph index. In other words: It looks like that all possible unicode combinations lead to a picture which is already prerendered in the ttf file. Output: Bitmap for glyph index of picture ক্ক.

Looking at existing TTF viewers, I observerd that they will only show the glyph for a given unicoded. Most viewers do not support sequence of unicodes. As a consequence you would never be able to see the ক্ক picture. It looks like that only ftview and the above mentioned glyph-inspector reveal those internal combinded glyphs.

ZinggJM commented 2 years ago

image

I think I downloaded it from https://www.softpedia.com/get/Others/Font-Utils/Unicode-Font-Viewer.shtml

olikraus commented 2 years ago

What I have learned is this: The unicode Bengali block in unicode is very much incomplete. For example you will not be able to find ক্ক in your viewer: ক্ক does not have a unicode instead a sequence of unicodes is required. However ক্ক is there in the ttf but it is revealed only through a special unicode sequence.