cann't read unicode correctly

srounet / Pymem

A python library for windows, providing the needed functions to start working on your own with memory editing.

MIT License

303 stars 45 forks source link

cann't read unicode correctly #66

Closed wkingnet closed 7 months ago

wkingnet commented 2 years ago

Describe the bug Hello, I am using pymem to a Taiwanese game.Traditional Chinese can be read correctly in CheatEngine, but pymem cannot be read correctly.

If you select UTF16 as the text encoding in the memory browser of CE, you can see Traditional Chinese correctly.

I provide screenshots to illustrate the situation more clearly.

Sorry that the screenshot is displayed in Chinese, but I think the core of the problem is that pymem cannot convert UTF16 encoding correctly?

So now I need to convert the encoding myself to display it correctly, right? And maybe pymem can be updated and it would be even better! Anyway, pymem is already very easy to use. I have used pywin32 for a few days and it makes me very confused

thanks very much

Your Environment

python version Python 3.9.5 (default, May 18 2021, 14:42:02) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
os version (32/64) win10 x64
pymem version 1.8.4

Expected behavior Display Unicode(UTF16) correctly

Traceback If applicable, add tracebacks to help explain your problem.

Additional context Add any other context about the problem here.

wkingnet commented 2 years ago

update: Problem has been solved

I wrote a small piece of code to convert the UTF16 encoding, maybe you can use it in the future too.

I think UTF8 or other encodings are applicable, you only need to replace ('utf-16') in the code

temp = b''
while pm.read_bytes(address, 1) != b'\x00':
    temp = temp + pm.read_bytes(address, 1)
    address += 1
print(temp.decode('utf-16'))

wkingnet commented 2 years ago

update: Problem has been solved

I wrote a small piece of code to convert the UTF16 encoding, maybe you can use it in the future too.

I think UTF8 or other encodings are applicable, you only need to replace ('utf-16') in the code
temp = b''
while pm.read_bytes(address, 1) != b'\x00':
    temp = temp + pm.read_bytes(address, 1)
    address += 1
print(temp.decode('utf-16'))

this code still have a problem.

result = b''
while pymem_instance.read_bytes(address, 1) != b'\x00':
    result = result + pymem_instance.read_bytes(address, 1)
    print(result)
    address += 1

If the string is all composed of UTF16 is OK. But if the string is composed of a mixture of UTF16 and ASCII, it will still read errors.

The reason for the error is that pymem's read_byte() automatically saves the bytes as ASCII codes, Although what I created in python is a byte variable.

I took two screenshots, the difference is only the CE display text encoding.

111

222

srounet commented 2 years ago

Thank you for the reporting, the value is returned as a c_char.

https://docs.python.org/3/library/ctypes.html#ctypes.create_string_buffer

Maybe there should be an alternative that does not break everything within the read_bytes function ?

wkingnet commented 2 years ago

Thank you for the reporting, the value is returned as a c_char.

https://docs.python.org/3/library/ctypes.html#ctypes.create_string_buffer

Maybe there should be an alternative that does not break everything within the read_bytes function ?

If read_bytes function has been using binary to save data like b'\x07\x86\xF8\x66', then everything will be fine. Because you can use decode('GBK/UTF8/16/LADIN') to decode into any kind of encoding

wkingnet commented 2 years ago

update code, now the code can correctly handle UTF16 and ascii mixed encoding

result = ""
address  # a memory address
while True:
    _temp = pymem.read_bytes(address, 2)
    if _temp == b'\x00\x00':
        break
    else:
        try:
            _temp = _temp.decode('utf-16')
        except UnicodeDecodeError:
            logger.warning(f'UTF-16 decode error, replace with empty str')
            _temp = ascii(_temp)
        finally:
            result += _temp
            _temp_address += 2

StarrFox commented 7 months ago

closing this assuming it was fixed by https://github.com/srounet/Pymem/pull/121

please comment if not