libtcod / python-tcod

A high-performance Python port of libtcod. Includes the libtcodpy module for backwards compatibility with older projects.
BSD 2-Clause "Simplified" License
413 stars 36 forks source link

Support encodings when loading/saving REXPaint files. #78

Closed jsbeckr closed 3 years ago

jsbeckr commented 5 years ago

I'm using Rexpaint and load the resulting .xp files with tcod.console_from_xp("data/images/title.xp"). When I blit the resulting console it looks different then the Rexpaint image.

Example (left: Rexpaint, right: tcod)

Screenshot 2019-06-13 at 21 45 35

I'm using a custom font (its a CP437 one), but it also happens with the standard font.

HexDecimal commented 5 years ago

This is a not uncommon encoding issue. python-tcod expects codepoints to be in Unicode, but the REXPaint tool saves them as EASCII.

You can use tcod.FONT_LAYOUT_ASCII_INROW to force libtcod to use EASCII, but then you'll no longer be able to give Unicode strings such as "░▒▓" to print functions. You can also decode the codepoints from CP437 to Unicode after the REXPaint file is loaded.

jsbeckr commented 5 years ago

Thanks for your quick answer!

tcod.FONT_LAYOUT_ASCII_INROW does work, but as you said it's unfortunate that I can't print Unicode chars anymore. I tried to figure out how to decode the CP437 codepoints to Unicode.

From my_console.tiles i get tuples like (23, fg, bg) where 23 is the unicode codepoint. I'm not sure how to get from a unicode to a CP437 codepoint. Or am I approaching this wrong?

HexDecimal commented 5 years ago

It's actually hard to do since the codepoints < 32 are special and most codecs such as the ones Python has will refuse to touch them. "".encode("cp437") should work for the higher codes. "\xNN" also works as expected when working with EASCII codepoints.

I usually use these two pages as references when making a custom codec:

https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT https://en.wikipedia.org/wiki/Code_page_437

Maybe I should add a way to access the LAYOUT_CP437 codec from Python.

jsbeckr commented 5 years ago

I guess I will go the CSV route for now to stay in unicode land. :)

HexDecimal commented 5 years ago

I can give an example on how to decode REXPaint files into Unicode if you want.

HexDecimal commented 5 years ago

Once the REXPaint file is loaded as a console you can convert it. Here's an example:

import tcod.console
import numpy as np

cp437 = np.array(
    [
        0x0000, 0x263A, 0x263B, 0x2665, 0x2666, 0x2663, 0x2660, 0x2022,
        0x25D8, 0x25CB, 0x25D9, 0x2642, 0x2640, 0x266A, 0x266B, 0x263C,
        0x25BA, 0x25C4, 0x2195, 0x203C, 0x00B6, 0x00A7, 0x25AC, 0x21A8,
        0x2191, 0x2193, 0x2192, 0x2190, 0x221F, 0x2194, 0x25B2, 0x25BC,
        0x0020, 0x0021, 0x0022, 0x0023, 0x0024, 0x0025, 0x0026, 0x0027,
        0x0028, 0x0029, 0x002A, 0x002B, 0x002C, 0x002D, 0x002E, 0x002F,
        0x0030, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, 0x0037,
        0x0038, 0x0039, 0x003A, 0x003B, 0x003C, 0x003D, 0x003E, 0x003F,
        0x0040, 0x0041, 0x0042, 0x0043, 0x0044, 0x0045, 0x0046, 0x0047,
        0x0048, 0x0049, 0x004A, 0x004B, 0x004C, 0x004D, 0x004E, 0x004F,
        0x0050, 0x0051, 0x0052, 0x0053, 0x0054, 0x0055, 0x0056, 0x0057,
        0x0058, 0x0059, 0x005A, 0x005B, 0x005C, 0x005D, 0x005E, 0x005F,
        0x0060, 0x0061, 0x0062, 0x0063, 0x0064, 0x0065, 0x0066, 0x0067,
        0x0068, 0x0069, 0x006A, 0x006B, 0x006C, 0x006D, 0x006E, 0x006F,
        0x0070, 0x0071, 0x0072, 0x0073, 0x0074, 0x0075, 0x0076, 0x0077,
        0x0078, 0x0079, 0x007A, 0x007B, 0x007C, 0x007D, 0x007E, 0x007F,
        0x00C7, 0x00FC, 0x00E9, 0x00E2, 0x00E4, 0x00E0, 0x00E5, 0x00E7,
        0x00EA, 0x00EB, 0x00E8, 0x00EF, 0x00EE, 0x00EC, 0x00C4, 0x00C5,
        0x00C9, 0x00E6, 0x00C6, 0x00F4, 0x00F6, 0x00F2, 0x00FB, 0x00F9,
        0x00FF, 0x00D6, 0x00DC, 0x00A2, 0x00A3, 0x00A5, 0x20A7, 0x0192,
        0x00E1, 0x00ED, 0x00F3, 0x00FA, 0x00F1, 0x00D1, 0x00AA, 0x00BA,
        0x00BF, 0x2310, 0x00AC, 0x00BD, 0x00BC, 0x00A1, 0x00AB, 0x00BB,
        0x2591, 0x2592, 0x2593, 0x2502, 0x2524, 0x2561, 0x2562, 0x2556,
        0x2555, 0x2563, 0x2551, 0x2557, 0x255D, 0x255C, 0x255B, 0x2510,
        0x2514, 0x2534, 0x252C, 0x251C, 0x2500, 0x253C, 0x255E, 0x255F,
        0x255A, 0x2554, 0x2569, 0x2566, 0x2560, 0x2550, 0x256C, 0x2567,
        0x2568, 0x2564, 0x2565, 0x2559, 0x2558, 0x2552, 0x2553, 0x256B,
        0x256A, 0x2518, 0x250C, 0x2588, 0x2584, 0x258C, 0x2590, 0x2580,
        0x03B1, 0x00DF, 0x0393, 0x03C0, 0x03A3, 0x03C3, 0x00B5, 0x03C4,
        0x03A6, 0x0398, 0x03A9, 0x03B4, 0x221E, 0x03C6, 0x03B5, 0x2229,
        0x2261, 0x00B1, 0x2265, 0x2264, 0x2320, 0x2321, 0x00F7, 0x2248,
        0x00B0, 0x2219, 0x00B7, 0x221A, 0x207F, 0x00B2, 0x25A0, 0x00A0,
    ]
)
cp437_encode = {v: i for i, v in enumerate(cp437)}

console = tcod.console.Console(5, 1)
console.tiles["ch"][0, :4] = ord("♥"), ord("♦"), ord("♣"), ord("♠")
print(console.tiles["ch"])

# Encode Unicode -> CP437
console.tiles["ch"] = np.vectorize(cp437_encode.__getitem__)(console.tiles["ch"])
print(console.tiles["ch"])

# Decode CP437 -> Unicode
console.tiles["ch"] = cp437[console.tiles["ch"]]
print(console.tiles["ch"])

Output:

[[9829 9830 9827 9824   32]]
[[ 3  4  5  6 32]]
[[9829 9830 9827 9824   32]]
jsbeckr commented 5 years ago

Oh wow nice! Works perfectly. 👍

Maybe this should be the default behavior for Consoles loaded via tcod.console_from_xp?

HexDecimal commented 5 years ago

The REXPaint spec actually does support Unicode, but it doesn't track what encoding it was saved with. So after you decode into Unicode you can save that back as a Unicode .xp file, then you can load that without needing to decode it again. An .xp file saved as Unicode can't be loaded by the REXPaint program, which expects CP437 again.

The ideal way to handle this is to allow a character mapping to be passed to the load and save functions. Similar to opening a file with a codec in Python. I can't use Python's existing codecs since they don't convert the 0-32 characters.

HexDecimal commented 3 years ago

The documentation now includes examples on how to convert the encoding between CP437 and Unicode when loading or saving REXPaint files.