trip5 / Matrix-Fonts

Fonts For Use with LED Matrix Clocks
MIT License
34 stars 4 forks source link

[Language Request] English International Phonetic Alphabet (IPA) #10

Closed andrewjswan closed 1 month ago

andrewjswan commented 7 months ago

Information

Is it possible to add English International Phonetic Alphabet (IPA) to Font 8 (with Cyrillic alphabet) (to display the transcription of English words), they obviously partially coincide with the existing ones, but it seems to me that not all of them.

https://icspeech.com/phonetic-symbols.html

trip5 commented 7 months ago

At first I thought "sure, why not?" - I'm a language teacher so I'm pretty familiar with the IPA, thought there are A LOT of symbols in use even just by English... according to Wikipedia the Unicode block contains 96 characters total, though perhaps English uses only about 40 or so (I think)... and indeed, a lot of them are based on Latin, Greek, Cyrillic characters anyway so it would be mostly just figuring out where to copy and paste...

I'll consider it. But... why?

trip5 commented 7 months ago

And just sticking some links here for later reference:

English IPA characters: https://www.phon.ucl.ac.uk/home/wells/phoneticsymbolsforenglish.htm

The whole lot: https://westonruter.github.io/ipa-chart/keyboard/

Wikipedia: https://en.wikipedia.org/wiki/IPA_Extensions

andrewjswan commented 7 months ago

I'll consider it. But... why?

It’s simple, I show the word of the day in English and Ukrainian, but I also want a transcription so that it is clear not only how the word is written, but also how it is pronounced.

trip5 commented 1 month ago

It's been awhile. I left this open because I figured I would get to it eventually. Well, today I did. I think they turned out well. Let me know when you have time to test it out!

andrewjswan commented 1 month ago

I wanted to check quickly, but I was in a hurry and made people laugh. I saved the fonts, flashed Esp, but forgot about glyph. So the experiment was unsuccessful. Today I will find all the symbols used, write them in glyph and check.

PS: A personal question, my child is writing a term paper, but he needs links to the original books, and we can't find them anywhere, neither in paper form, nor in electronic form, nor in translation. Do you know if they are available at all? Books:

andrewjswan commented 1 month ago

I added this set of symbols æðŋ̩ɑɒɔəɚɛɜɡɪ̠̈ʃʊʌʒˌθ, I'll check it tomorrow

trip5 commented 1 month ago

For the first book, looks like out of print for a while now. Kyobo is Korea's biggest bookstore. Aladin is Korea's biggest used bookstore. Both also sell ebooks and ebook readers so they would list it if they had it in physical or digital...

First book: https://product.kyobobook.co.kr/detail/S000001178084 https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=162809

Second book: https://product.kyobobook.co.kr/detail/S000001188042 https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=191939

These are (unfortunately) probably available in University libraries here (I'm not even near one these days)... or the kind of book you find while browsing an old used-book store in a dark alley. Lol.

I'm not even sure Koreans know well the ancient myths of Korea. Everyone knows Dangun but it looks like your kid is researching something even older... Is this useful? https://www.koreanquarterly.org/front_page_below_fold/creation-myths/ - A new book was recently published... in English? I'll ask my wife tomorrow if she knows any of the Korean Creation myths... and if her online reading apps have any (she's an avid reader of nonfiction).

I don't know if this site is still online: http://www.korea-np.co.jp/ - It's a North Korean mythology site that I collected a few good folk tales from. SK routinely blocks NK sites and it's currently not loading but... maybe?

If your kid hasn't already learned, when it comes to Korean sites, Koreans use Naver and Daum to search. There may be some links to something useful there...

andrewjswan commented 1 month ago

Is this useful?

Of course, expanding your horizons is always useful :)

If your kid hasn't already learned, when it comes to Korean sites, Koreans use Naver and Daum to search. There may be some links to something useful there...

Thanks!

trip5 commented 1 month ago

I asked my wife. She was raised Christian and doesn't know any of the old myths except Dangun. I d

She's a bit concerned because there is a Shaman group in Korea that operates some Korean language classes and even a few universities and she thinks they are a cult... apparently they teach that the old Korean gods are "original" and even are the One God as taught by Christians, Muslims, etc. I didn't know what to say to that.

Anyways, sorry I can't be of more help. I myself have been very inquisitive about Korean folktales and ancient culture while here in Korea but apparently this is something I missed. Will you let me know if your child finds anything interesting? I would almost suggest to change the topic to folktales since that might be easier to research.

andrewjswan commented 1 month ago

you have time to test it out

I output the text that was added to the glyph, problems arose only with these symbols, -ŋ̩-ɔ-ɪ̠̈-ˌ- I inserted a separator - for understanding.

        action: esphome.pixel_clock_text_screen
        data:
          default_font: false
          text: "-ŋ̩-ɔ-ɪ̠̈-ˌ-"
          lifetime: 1
          screen_time: 10
          r: 240
          g: 240
          b: 240
andrewjswan commented 1 month ago

She's a bit concerned

Don't worry, the topic of the term paper is "Cosmological myths in the Korean and Ukrainian cultural and colloquial tradition", but we need to compare this from ancient "mythological" times, so we need support in the form of books (we have no problems with the myths of Ukraine, but we have a problem with books on the myths of Korea). So we are looking for primary sources in every possible way.

trip5 commented 1 month ago

I had a little chat with ChatGPT to get the Unicode character ranges of diacritic, diaeresis, and other character modifiers... And as a test (sorry, I had such confidence in my font, I didn't even test the IPA on any of my clocks yet), I added these 2 characters:

̠ (COMBINING MINUS SIGN BELOW): 800
̈ (COMBINING DIAERESIS): 776

And copied this into my glyphs (though I did have to find that other ɪ and remove it:

ɪ̠̈

Home Assistant knew how to handle it: 20241011_052731

And the ESPHome logs shows the Unicode OK:

[05:52:11][D][main:157]: Message: ɪ̠̈

But the clock did not handle it well: 20241011_052707

I also tried setting these characters with reverse margins (which is I guess what you're supposed to do with characters like this?... to limited success. The clock insisted on entering scroll mode even though it has only 1 character block and 3 unicode characters... so I added some minus signs and... only one of the modifiers actually printed... and I'm not sure why. 20241011_054704

Tried various combinations of this: ɪ̠--- ɪ̈--- ɪ̠̈. And whichever order I did it in, only the first would display (and if it was ɪ̠̈ then the dots wouldn't display - so my guess is that by using a reverse margin on the modifier character, it bugs up something... so you'd get one modifier only per message and the rest of the text doesn't display. And I'm pretty sure if there's one modifier, there will be more...

So that was a neat test but... it looks like it can't be done properly. Whatever handles the Unicode in ESPHome firmware probably doesn't 100% know what to do with combining characters. And in any case, real TTF fonts will calculate underlines, etc. based on character width, while a pixel font... if the underline is longer or shorter than the character, it'll look weird (as in the example.. and yes, I tested with various pixel lengths).

So, if you're getting a word of the day, find some way to filter them out...? They're not 100% critical to the IPA anyways, right? I suppose I could also just make all the modifiers null (a pixel length of zero) but you'd still have to add them to your glyphs list (taking up possibly an additional 31 slots when you're limited to 255). I did test that out and it looks fine and no weird scrolling. But surely if you're using Home Assistant, there's a filter you can apply which would be more efficient.

ChatGPT says this is a good Home Assistant filter (try it out in Templates):

{## Imitate available variables: ##}
{% set my_test_json = {
  "text" : "ɪ̠̈ ˈkæt ɔɪ̈"
} %}

{{ my_test_json.text | regex_replace('[\u02B0-\u02FF\u0300-\u036F\u1DC0-\u1DFF\uFE20-\uFE2F]', '') }}

or if setting variables:

{% set text = "ɪ̠̈ ˈkæt ɔɪ̈" %}
{% set cleaned_text = my_test_json.text | regex_replace('[\u02B0-\u02FF\u0300-\u036F\u1DC0-\u1DFF\uFE20-\uFE2F]', '') %}
{{ cleaned_text }}

By the way, what are you doing with this? As a hobbyist linguist, I'm more than curious at this point. :)

andrewjswan commented 1 month ago

By the way, what are you doing with this? As a hobbyist linguist, I'm more than curious at this point. :)

If the problem is only with this symbol ɪ̠̈, then I can easily replace it with the letter ї, I don’t think I’ll see much of a difference. :)

But the ɔ symbol is displayed as c for me, I think something can be done about it. :)

What to do with this ŋ̩ symbol? Replace it with some analogue? Or is there a solution?

trip5 commented 1 month ago

You were right about the backwards c. I was probably a bit hasty with some other characters at the time too. They should all be fixed now.

And like I said, it's more than a problem with ɪ̠̈ - the problem also happened with ɪ̠ and ɪ̈ - anytime there's a combining character, the clock stops displaying whatever was after it. So ŋ̩ would make the same problem. I'm 100% certain.

That's why a regex filter will work so well... the above searches entire unicode ranges for those characters and just removes them from the string, leaving with just standard IPA.... like taking A-B-C-D and turning it into ABCD.

Sidenote: I use Notepad++ which seems to have a weird way of treating these characters, too:

weird And does a good job of separating them... not sure why it doesn't split that 4th one up... copied and pasted here:

ŋ̩ 
ɪ̠ 
ɪ̈
ɪ̠̈

ŋ̩---
ɪ̠---
ɪ̈---
ɪ̠̈---

Seriously how are you retrieving this data? There's definitely a way to implement a filter into Home Assistant... the above should work well if its part of template in your HA config. I'm curious because I could also help test out the sensor and fix it up in my Home Assistant.

andrewjswan commented 1 month ago

Seriously how are you retrieving this data?

I'm taking it from the site, filtering the symbols is not a problem, so I'll do it that way, I'll replace it ɪ̠̈ to ї and ŋ̩ to ... n :)

trip5 commented 1 month ago

I mean, can I see your sensor or automation you use?

And just to clarify: ɪ̠̈ will actually become ɪ since it will strip all combined characters. It's not a great solution. I realize that in many cases, they indicate stress, syllables, tone, etc. etc... but there's not much I can do about that... IPA is such a specialized character-set that it doesn't follow regular character-rules. Every other language - when adding an accent mark - just made it a new Unicode value - check out the 90+ vowels I just added for Vietnamese lol. IPA can't do that without getting more than 100,000 glyphs.

I do wonder if I can set up your automation, if it would look OK on an e-ink screen. Since that uses real TTF fonts, it might work better. Might not. I'm just very curious to try...

andrewjswan commented 1 month ago

I mean, can I see your sensor or automation you use?

Python scripts :)

https://github.com/AlexxIT/PythonScriptsPro

python_scripts.zip

andrewjswan commented 1 month ago
  # Home Assistant English
  - id: hall_pixel_clock_english
    alias: Hall Pixel Clock - English
    trigger:
      - platform: state
        entity_id: binary_sensor.night_mode_awake
        to: 'on'
    action:
      - action: python_script.exec
        data:
          file: /config/python_scripts/random_word.py
          cache: false
        response_variable: result
      - action: esphome.esp_hall_pixel_clock_text_screen
        data:
          default_font: false
          text: "{{ result.word_of_day.word | capitalize }} #A0A0A0{{result.word_of_day.transcription}}#F0F0F0 {{ result.word_of_day.translate | capitalize }}"
          lifetime: 180
          screen_time: 5
          r: 240
          g: 240
          b: 240
trip5 commented 1 month ago

Neat.

Looks like you could just do this:

text: "{{ result.word_of_day.word | capitalize }} #A0A0A0{{result.word_of_day.transcription|regex_replace('[\u02B0-\u02FF\u0300-\u036F\u1DC0-\u1DFF\uFE20-\uFE2F]', '')}}#F0F0F0 {{ result.word_of_day.translate | capitalize }}"
andrewjswan commented 1 month ago

Looks like you could just do this:

It's easier to do this when generating json, that's what I did...