Wrong char on input when using keyboard on input_text widgets after setting add_font_range

maku2903 commented 2 years ago

Version of Dear PyGui

Version: 0.8.39 Operating System: Affected: Windows 10 Not affected: Fedora 34 Not tested: Mac

My Issue/Question

Using custom char range for "Latin Extended-A" from: https://en.wikipedia.org/wiki/List_of_Unicode_characters#Latin-1_Supplement [0x0100, 0x017f] Input chars are getting messed up... Issue affects only Windows 10. Issue doesnt affect Fedora 34 on the same machine and same keyboard.

I dont know how to verify my hypothesis: DPG doesnt take into consideration system default encoding. obraz I'm using Windows with cp1250 encoding {command: [System.Text.Encoding]::Default obraz

Fedora encoding:

[test@localhost ~]$ locale charmap
UTF-8

Question to the Author: does DPG take into consideration default system encoding? Not every system use 'utf-8'. Maybe thats the problem?

When copy-pasting chars from f.e. notepad.exe everything is ok. add_char_remap(wrong_char, correct_char) fixes problem but every wrong_char will be wrong when actually used elsewhere...

To Reproduce

Steps to reproduce the behavior:

Run example on Windows 10 with cp1250 default encoding, polish_programmer keyboard layout.
Input supposed chars
Watch the hell burn.

Expected behavior

Input of correct chars on any system.

Screenshots/Video

obraz

Standalone, minimal, complete and verifiable example

import dearpygui.dearpygui as dpg

path = 'font_path.ttf'

with dpg.font_registry():
    with dpg.font(path, 20, default_font=True):
        dpg.add_font_range(0x0100, 0x017f)

def to_hex(s: str):
    return ' '.join(hex(ord(x)) for x in s)

def callback(s, u, a):
    val = dpg.get_value(item=s)
    dpg.configure_item(item=a[0], default_value=val)
    dpg.configure_item(item=a[1], default_value=to_hex(val))

with dpg.window(label="Main window", width=500, height=400) as main_window_id:
    dpg.set_primary_window(main_window_id, True)
    text_id = dpg.generate_uuid()
    hex_text_id = dpg.generate_uuid()
    chars = 'ĄąĘęŹźŻż'
    dpg.add_input_text(callback=callback, user_data=(text_id, hex_text_id), label='Input this: ' + chars)
    dpg.add_separator()
    dpg.add_text(default_value='Should be:')
    dpg.add_text(default_value=chars)
    dpg.add_text(default_value=to_hex(chars))
    dpg.add_separator()
    dpg.add_text(default_value='Is:')
    dpg.add_text(id=text_id, default_value='', label='Val from input text')
    dpg.add_text(id=hex_text_id, default_value='', label='Hex val from input text')
    dpg.add_separator()

dpg.start_dearpygui()

maku2903 commented 2 years ago

I'm trying to prove my point: Conclusion: dpg gets char hex as cp1250 F.e. letter "Ą" is exactly hex 0xA5 in cp1250 according to https://en.wikipedia.org/wiki/Windows-1250 and screenshot of example confirms that.

EDIT: trying to find temp solution (convert: hex->bytes->decode in system cp)

import dearpygui.dearpygui as dpg

path = 'font_path.ttf'
with dpg.font_registry():
 with dpg.font(path, 20, default_font=True):
 dpg.add_font_range(0x0100, 0x017f)
 # dpg.add_char_remap(0x00a5, 0x0104)

import locale
PREFERENCED_ENCODING = locale.getpreferredencoding()

def repair_encoding(s: str) -> str:
 if PREFERENCED_ENCODING == 'utf-8' or len(s) == 0:
    return s
 else:
    return ''.join(bytes.fromhex(hex(ord(x))[2:]).decode(PREFERENCED_ENCODING) for x in s)

def to_hex(s: str):
 return ' '.join(hex(ord(char)) for char in s)

def repair_callback(sender, a, u):
 text_id = u[0]
 hex_text_id = u[1]
 text_rep_id = u[2]
 hex_text_rep_id = u[3]
 val = dpg.get_value(item=sender)
 dpg.configure_item(item=text_id, default_value=val)
 dpg.configure_item(item=hex_text_id, default_value=to_hex(val))
 repair_val = repair_encoding(val)
 dpg.configure_item(item=text_rep_id, default_value=repair_val)
 dpg.configure_item(item=hex_text_rep_id, default_value=to_hex(repair_val))

with dpg.window(label="Main window", width=500, height=400) as main_window_id:
 dpg.set_primary_window(main_window_id, True)
 dpg.add_text(default_value=f'Code page: {PREFERENCED_ENCODING}')
 text_id = dpg.generate_uuid()
 hex_text_id = dpg.generate_uuid()
 text_rep_id = dpg.generate_uuid()
 hex_text_rep_id = dpg.generate_uuid()
 chars = 'ĄąĘęŹźŻż'
 dpg.add_input_text(
 callback=lambda s, a, u: [repair_callback(s, a, u)]
 , user_data=(text_id, hex_text_id, text_rep_id, hex_text_rep_id)
 , label='Input this: ' + chars
 )
 dpg.add_separator()
 dpg.add_text(default_value='Should be:')
 dpg.add_text(default_value=chars)
 dpg.add_text(default_value=to_hex(chars))
 dpg.add_separator()
 dpg.add_text(default_value='Is:')
 dpg.add_text(id=text_id, default_value='', label='Val from input text')
 dpg.add_text(id=hex_text_id, default_value='', label='Hex val from input text')
 dpg.add_separator()
 dpg.add_text(default_value='Repaired:')
 dpg.add_text(id=text_rep_id, default_value='', label='Val from input text')
 dpg.add_text(id=hex_text_rep_id, default_value='', label='Hex val from input text')
 dpg.add_separator()

dpg.start_dearpygui()

obraz

thainik commented 2 years ago

And with Russian the same problems. When it will be fixed?

hoffstadt commented 2 years ago

@thainik Hi, I'm not a font expert but russion I believe you need to remap the characters, see this. This is issue is the exact reason we added add_char_remap(...). We would love for one of the russian users to provide a complete remapping script so we can add it in but we have not seen it yet!

Let me know how it goes.

means0nothing commented 2 years ago

@thainik Hi, I'm not a font expert but russion I believe you need to remap the characters, see this. This is issue is the exact reason we added add_char_remap(...). We would love for one of the russian users to provide a complete remapping script so we can add it in but we have not seen it yet!

Let me know how it goes.

@hoffstadt All this working fine, and everything rendering according to what you code, what char glyph you remap etc. But the thing is that when you type into add_input_text from keyboard, and your system keyboard language not in Basic Latin(U+0020 - U+007E), DearPyGui assume everithing is in Latin-1 Supplement (U+00A0 - U+00FF) and ignore keyboard language mapping, which now may be in cyrillic unicode block for example. So get_value return string where all chars is in range U+0020 - U+00FF, and we need every time to translate Latin-1 Supplement chars to another unicode chars, according to language we currently use. And this situation is really sad. When you copy from somewhere into add_input_text or when you set_value from your code - all unicodes are exactly the same as it has to bee.

Your product is really great, and want to believe that it becomes perfect in some time.

MaximKolbin commented 2 years ago

@thainik Hi, I'm not a font expert but russion I believe you need to remap the characters, see this. This is issue is the exact reason we added add_char_remap(...). We would love for one of the russian users to provide a complete remapping script so we can add it in but we have not seen it yet! Let me know how it goes.

@hoffstadt All this working fine, and everything rendering according to what you code, what char glyph you remap etc. But the thing is that when you type into add_input_text from keyboard, and your system keyboard language not in Basic Latin(U+0020 - U+007E), DearPyGui assume everithing is in Latin-1 Supplement (U+00A0 - U+00FF) and ignore keyboard language mapping, which now may be in cyrillic unicode block for example. So get_value return string where all chars is in range U+0020 - U+00FF, and we need every time to translate Latin-1 Supplement chars to another unicode chars, according to language we currently use. And this situation is really sad. When you copy from somewhere into add_input_text or when you set_value from your code - all unicodes are exactly the same as it has to bee.

Your product is really great, and want to believe that it becomes perfect in some time.

Hello. Don 't you have an example ? how you solved this problem. Also faced with this problem. Your product is good.

means0nothing commented 2 years ago

Hello. Don 't you have an example ? how you solved this problem. Also faced with this problem. Your product is good.

The simpliest - chr(ord(input_text_char) + offset_according_to_unicode)

kosvitko commented 3 months ago

@maku2903 Thank you for your solution. Though I've updated it a bit, so it works when using input_text.set_value with already repaired string.

import locale
PREFERENCED_ENCODING = locale.getpreferredencoding()

def repair_encoding(s: str) -> str:
 if PREFERENCED_ENCODING == 'utf-8' or len(s) == 0:
    return s
 else:
    return ''.join(bytes.fromhex(hex(ord(x))[2:]).decode(PREFERENCED_ENCODING) if ord(x)<=255 else x for x in s)

usage example would be something like this:

input = dpg.add_input_text()
#input some text 
input_text = repair_encoding(dpg.get_value(input))

#decide to edit text
dpg.set_value(input, input_text) #displays correctly 

#edit or do nothing with input
edited_text = repair_encoding(dpg.get_value(input))

Now it's not breaking here due to presense of already decoded chars

hoffstadt / DearPyGui