tomooda / PharoIM

InputMethod support for Pharo on headless VMs
MIT License
6 stars 0 forks source link

Emoji is not translated with Chinese IM #17

Closed tomooda closed 2 years ago

tomooda commented 2 years ago

With the Pinyin IM, typing "hh" brings a "haha" emoji in the candidates' window. If you select the "haha" emoji, you'll get "h h" in the text.

We need to identify the cause, which could be in either Pharo, PharoIM or SDL2. First reported by @Dieken at https://github.com/pharo-project/pharo/issues/8661#issuecomment-1148814530 .

Dieken commented 2 years ago

https://emojipedia.org/beaming-face-with-smiling-eyes/

image
tomooda commented 2 years ago

@Dieken Thanks!

tomooda commented 2 years ago

With Japanese IM on Mac, the same emoji (U+1F601) can be converted from 'emoji' and Pharo can successfully receive the character into a text area. Is there possibility that the 'hh' on Chinese IM triggers series of events different from usual conversion?

Dieken commented 2 years ago

I use macOS Monterey 12.3.1, English as primary language, I have no idea how to debug this issue further, I'm not familiar with Pharo 😫

$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

Could you try Pinyin input source? Input "xiao" (笑,means "laugh") or "hh" or "haha"(哈哈,also means "laugh"), you will see the emoji, select that candidate character with number "2" but Pharo shows the pre-edit input "xiao" or "hh", "haha".

image image image image
tomooda commented 2 years ago

@Dieken Thank you for your contributions. I'll set up Chinese env on Mac to dig into this issue. :smile:

tomooda commented 2 years ago

Now I can input the "hh" emoji and also "xiao". @Dieken Please try installing PharoIM into a plain Pharo image, or use Iceberg to incorporate the latest commit.

Dieken commented 2 years ago

Nice, it works, thank you!

Seems Pharo doesn't support emoji font, it can't render that character, anyway, the character itself is correct.

Four emoji characters:

'😄😸😈😏' asArray collect: #codePoint.
image
tinchodias commented 2 years ago

Hi. I din't know of this issue, and also don't know about IM, but I researched how to render emojis in Pharo. I am at low-level, using cairo and freetype bindings to render, and always got the same reported here.

I've observed that the Pharo Font Chooser doesn't show my Emoji ttf ("Noto Color Emoji" on my linux). Then, I debugged FreeTypeFontProvider initialize and found an issue in the FFI bindings (to freetype) when loading this font. Our FFI bindings don't expect to load a bitmap font and fail, but it's supported at the C lib level.

This is an issue because at the end the Pharo text editor when has to draw itself, should tell cairo something like:

I'm trying to find a solution. However, I wanted to drop a question here: Does IM tell the Pharo text editor which font to use, when a character is inserted?

Cheers, Martín

tomooda commented 2 years ago

Hi Martín, The IM only provides Unicode strings to Pharo text editors. To handle exotic characters like emojis, math symbols and so on, I think we need to extend FontFamily so that it picks the first font from alternatives that has a glyph for the given character, like in font-family in CSS.

tinchodias commented 2 years ago

Thanks. I guess this should be implemented in the Rubric text editor. I continued debugging in a previous step: the font loading process and trying to spot when my emoji ttf is ignored and so doesn't reach the list of font families of Pharo.

tinchodias commented 2 years ago

Hi again. I want to understand more. The codePoint of 笑 is 31505 in my inspector. Do you know how should be programmed the text editor to detect the correct font face?

I imagine user is writing in default code font (Source Code Pro), and it receives a insert string event (I guess) with a WideString that includes the 笑 Character. For that, the editor should have as a setting a list of font faces that iterates until it finds one that includes a glyph for the character.

Do you think this what should happen? do you have any reference of how other editors implement it?

Thanks

tomooda commented 2 years ago

Martin, I don't think editors are responsible for that, but rendering backends are. I was thinking of a new subclass of AbstractFont that would do the font selection to obtain glyphs in methods like displayString:on:from:to:at:kern:baselineY:.

Dieken commented 2 years ago

Hi again. I want to understand more. The codePoint of 笑 is 31505 in my inspector. Do you know how should be programmed the text editor to detect the correct font face?

I imagine user is writing in default code font (Source Code Pro), and it receives a insert string event (I guess) with a WideString that includes the 笑 Character. For that, the editor should have as a setting a list of font faces that iterates until it finds one that includes a glyph for the character.

Do you think this what should happen? do you have any reference of how other editors implement it?

Thanks

Font selection is controlled by text rendering stack and application level font settings for the stack, on Linux they are:

Windows and macOS have different native stack, but whatever it uses, it's not responsibility of input method.

Reference: https://mrandri19.github.io/2019/07/24/modern-text-rendering-linux-overview.html

Text rendering is very complex, it's better to use some all-in-one graphics toolkit, such as Skia: https://github.com/feenkcom/Sparta, Cairo: https://www.cairographics.org

tinchodias commented 2 years ago

Thanks for the pointer.

In an image I was able to render emoji via Athens-Cairo + Freetype bindings.

But I had to do some ugly hack to force the FreeTypeFontProvider, because it discards the "Noto Color Emoji" font face when it scans the .ttf file on my disk. The reason is that the Pharo binding for Freetype isn't prepared for a bitmap font face.

Our binding wants to extract a bounding box but it isn't available for the emoji font as for regular scalable fonts (see bbox in https://freetype.org/freetype2/docs/reference/ft2-base_interface.html#ft_facerec).

But Athens-Cairo doesn't require such bbox information to render: image

tinchodias commented 2 years ago

For the record, the code was:

AthensSurfaceExamples class >>
exampleDrawEmoji

    | aSurface |
    aSurface := self newSurface: 300@300.

    aSurface drawDuring: [ :aCanvas |
        | aScaledFont advance |
        aSurface clear: Color white.
        aCanvas setPaint: Color black.

        aScaledFont := aCanvas setFont: (LogicalFont familyName: 'Source Sans Pro' pointSize: 32).
        aCanvas pathTransform translateBy: 0 @ aScaledFont fontHeight.
        advance := aCanvas drawString: 'Hello_' asString.
        aCanvas pathTransform translateBy: advance x @ 0.

        aScaledFont := aCanvas setFont: (LogicalFont familyName: 'Noto Color Emoji' pointSize: 32).
        advance := aCanvas drawString: (Unicode value: 16r1F30E) asString.
        ].

    aSurface asMorph openInWindow
tinchodias commented 2 years ago

Maybe the following step was the only modification required to load the font (not sure now because during debug I touched in several places):

Append at the end of FT2Face >> loadFields:

    (bbox width = 0 and: [faceRect num_fixed_sizes > 0]) ifTrue: [ 
        height := faceRect available_sizes height.
        bbox := Rectangle center: 0@0 extent: faceRect available_sizes width @ height ]

and do FreeTypeFontProvider current updateFromSystem to re-scan the font directories of your disk.

Then, you can do FontChooser example and look if the "Noto Color Emoji" font face is available (I'm assuming the font was previously installed in your disk).

tinchodias commented 2 years ago

For the record, after the Athens-Cairo example worked fine, I tried opening a Rubric editor with the emoji, but it failed:

RubTextAreaExamples class >>
emojiEditor
    <example>
    | font1 font2 tMorph t1 t2 t3 |
    font1 := TextFontReference toFont: (LogicalFont familyName: 'Noto Color Emoji' pointSize: 20).
    font2 := TextFontReference toFont: (LogicalFont familyName: 'Source Sans Pro' pointSize: 20).

    tMorph := RubEditingArea new.
    t1 := 'Hola' asText addAttribute: font1.
    t2 := (Unicode value: 16r1f642) asString asText
        addAttribute: font2;
        yourself.
    t3 := 'Mundo' asText
        addAttribute: font1;
        addAttribute: TextEmphasis underlined.
    tMorph updateTextWith: t1 , t2 , t3.
    tMorph openInWindow
tinchodias commented 2 years ago

Additional note, Pharo 11 for Mac comes with:

CairoLibrary uniqueInstance versionString.  "--> 1.15.4"
FT2Library current libraryVersion.  "--> 2.9.1"

And Noto Color Emoji didn't render until I changed to more recent libs:

CairoLibrary uniqueInstance versionString.  "--> 1.16.0"
FT2Library current libraryVersion.  "--> 2.12.1"

(These are cairo and freetype dylibs found in pharo-vm/Pharo.app/Contents/MacOS/Plugins)