Open heliomass opened 4 years ago
Unicode recently added a block containing characters for legacy computing to support representing most old 8-bit character sets in Unicode (https://en.wikipedia.org/wiki/Symbols_for_Legacy_Computing). This means a function like CHR$ could simply be a lookup table that returns the corresponding Unicode character that matches the specified PETSCII value. It wouldn't support all control characters (they are all mapped to U+FFFD below), but all graphical characters would work fine if a font that contains those characters is used.
Here is the code points mapping array for the Commodore 64:
[0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0x85,0xE,0xFFFD,0xFFFD,0x84,0xFFFD,0xFFFD,0x8,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0x1B,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0x20,0x21,0x22,0x23,0x24,0x25,0x26,0x27,0x28,0x29,0x2A,0x2B,0x2C,0x2D,0x2E,0x2F,0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x3A,0x3B,0x3C,0x3D,0x3E,0x3F,0x40,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x4A,0x4B,0x4C,0x4D,0x4E,0x4F,0x50,0x51,0x52,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5A,0x5B,0xA3,0x5D,0x2191,0x2190,0x1FB79,0x2660,0x1FB72,0x1FB78,0x1FB77,0x1FB76,0x1FB7A,0x1FB71,0x1FB74,0x256E,0x2570,0x256F,0x1FB7C,0x2572,0x2571,0x1FB7D,0x1FB7E,0x25CF,0x1FB7B,0x2665,0x1FB70,0x256D,0x2573,0x25CB,0x2663,0x1FB75,0x2666,0x253C,0x1FB8C,0x2502,0x3C0,0x25E5,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xA,0xF,0xFFFD,0xFFFD,0x8D,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xA0,0x258C,0x2584,0x2594,0x2581,0x258F,0x2592,0x2595,0x1FB8F,0x25E4,0x1FB87,0x251C,0x2597,0x2514,0x2510,0x2582,0x250C,0x2534,0x252C,0x2524,0x258E,0x258D,0x1FB88,0x1FB82,0x1FB83,0x2583,0x1FB7F,0x2596,0x259D,0x2518,0x2598,0x259A,0x1FB79,0x2660,0x1FB72,0x1FB78,0x1FB77,0x1FB76,0x1FB7A,0x1FB71,0x1FB74,0x256E,0x2570,0x256F,0x1FB7C,0x2572,0x2571,0x1FB7D,0x1FB7E,0x25CF,0x1FB7B,0x2665,0x1FB70,0x256D,0x2573,0x25CB,0x2663,0x1FB75,0x2666,0x253C,0x1FB8C,0x2502,0x3C0,0x25E5,0xA0,0x258C,0x2584,0x2594,0x2581,0x258F,0x2592,0x2595,0x1FB8F,0x25E4,0x1FB87,0x251C,0x2597,0x2514,0x2510,0x2582,0x250C,0x2534,0x252C,0x2524,0x258E,0x258D,0x1FB88,0x1FB82,0x1FB83,0x2583,0x1FB7F,0x2596,0x259D,0x2518,0x2598,0x3C0]
CHR$ just needs to pick the value at the requested index from the array and return it as a string according to the encoding of strings in your environment. Beware many are outside of the Unicode Basic Multilingual Plane and will require a surrogate pair in UTF-16, or proper multibyte encoding in UTF-8 up to 4 bytes)
Here's what they look like with the unscii font (http://viznut.fi/unscii/):
Other mapping arrays could be used to support different character sets, for example a CHR$MODE global variable or function could be used to select between PET, C64, TRS-80, Apple2, ...
As a follow-up with a quick and dirty change of the CHROUT function, so the mapping is performed on output, not in the internal string representation.
These two screenshots are cbmbasic running in Windows Terminal with the unscii 8 font.
And running in gnome-terminal: Note this one is still the Windows x64 binary, just running through WSL in a Linux gnome-terminal to show the PETSCII character set to Unicode mapping works.
Here's a more complete fix with an array of string that contains the Unicode characters as well as most control codes as VT escape sequences. This array uses UTF-16 to take advantage of the Win32 WriteConsoleW function, which makes it independent of the 8-bit code page in use. A Linux build should probably use equivalent UTF-8 encoding instead.
The array that maps PETSCII to Unicode and VT sequences:
static LPCWSTR apszCharsMap[256] = { L"",L"",L"",L"\uFFFD",L"",L"\x1B[38;5;231m",L"",L"",L"\uFFFD",L"\uFFFD",L"",L"",L"",L"\r\n",L"\x0E",L"",L"",L"\x1B[B",L"\x1B[7m",L"\x1B[H",L"\b",L"",L"",L"",L"",L"",L"",L"\x1B",L"\x1B[38;5;88m",L"\x1B[C",L"\x1B[38;5;34m",L"\x1B[38;5;20m",L" ",L"!",L"\"",L"#",L"$",L"%",L"&",L"'",L"(",L")",L"*",L"+",L",",L"-",L".",L"/",L"0",L"1",L"2",L"3",L"4",L"5",L"6",L"7",L"8",L"9",L":",L";",L"<",L"=",L">",L"?",L"@",L"A",L"B",L"C",L"D",L"E",L"F",L"G",L"H",L"I",L"J",L"K",L"L",L"M",L"N",L"O",L"P",L"Q",L"R",L"S",L"T",L"U",L"V",L"W",L"X",L"Y",L"Z",L"[",L"\xA3",L"]",L"\u2191",L"\u2190",L"\U0001FB79",L"\u2660",L"\U0001FB72",L"\U0001FB78",L"\U0001FB77",L"\U0001FB76",L"\U0001FB7A",L"\U0001FB71",L"\U0001FB74",L"\u256E",L"\u2570",L"\u256F",L"\U0001FB7C",L"\u2572",L"\u2571",L"\U0001FB7D",L"\U0001FB7E",L"\u25CF",L"\U0001FB7B",L"\u2665",L"\U0001FB70",L"\u256D",L"\u2573",L"\u25CB",L"\u2663",L"\U0001FB75",L"\u2666",L"\u253C",L"\U0001FB8C",L"\u2502",L"\u03C0",L"\u25E5",L"",L"\x1B[38;5;173m",L"",L"\uFFFD",L"",L"\uFFFD",L"\uFFFD",L"\uFFFD",L"\uFFFD",L"\uFFFD",L"\uFFFD",L"\uFFFD",L"\uFFFD",L"\x85",L"\x0F",L"",L"\x1B[38;5;16m",L"\x1B[A",L"\x1B[27m",L"\x1B[2J\x1B[H",L"\uFFFD",L"\x1B[38;5;94m",L"\x1B[38;5;204m",L"\x1B[38;5;59m",L"\x1B[38;5;102m",L"\x1B[38;5;155m",L"\x1B[38;5;33m",L"\x1B[38;5;145m",L"\x1B[38;5;127m",L"\x1B[D",L"\x1B[38;5;227m",L"\x1B[38;5;123m",L"\xA0",L"\u258C",L"\u2584",L"\u2594",L"\u2581",L"\u258F",L"\u2592",L"\u2595",L"\U0001FB8F",L"\u25E4",L"\U0001FB87",L"\u251C",L"\u2597",L"\u2514",L"\u2510",L"\u2582",L"\u250C",L"\u2534",L"\u252C",L"\u2524",L"\u258E",L"\u258D",L"\U0001FB88",L"\U0001FB82",L"\U0001FB83",L"\u2583",L"\U0001FB7F",L"\u2596",L"\u259D",L"\u2518",L"\u2598",L"\u259A",L"\U0001FB79",L"\u2660",L"\U0001FB72",L"\U0001FB78",L"\U0001FB77",L"\U0001FB76",L"\U0001FB7A",L"\U0001FB71",L"\U0001FB74",L"\u256E",L"\u2570",L"\u256F",L"\U0001FB7C",L"\u2572",L"\u2571",L"\U0001FB7D",L"\U0001FB7E",L"\u25CF",L"\U0001FB7B",L"\u2665",L"\U0001FB70",L"\u256D",L"\u2573",L"\u25CB",L"\u2663",L"\U0001FB75",L"\u2666",L"\u253C",L"\U0001FB8C",L"\u2502",L"\u03C0",L"\u25E5",L"\xA0",L"\u258C",L"\u2584",L"\u2594",L"\u2581",L"\u258F",L"\u2592",L"\u2595",L"\U0001FB8F",L"\u25E4",L"\U0001FB87",L"\u251C",L"\u2597",L"\u2514",L"\u2510",L"\u2582",L"\u250C",L"\u2534",L"\u252C",L"\u2524",L"\u258E",L"\u258D",L"\U0001FB88",L"\U0001FB82",L"\U0001FB83",L"\u2583",L"\U0001FB7F",L"\u2596",L"\u259D",L"\u2518",L"\u2598",L"\u03C0" };
(You can check out https://github.com/PhMajerus/AXSH.Library/blob/master/Functions/CBMChr.vbs for an easier to read version of this mapping)
In CHROUT, both putchar(A);
can be replaced with
{
LPCWSTR pszChar = apszCharsMap[A];
WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), pszChar, wcslen(pszChar), NULL, NULL);
}
And all cases handling special characters, except '"'
can be removed as this array contains the equivalent VT sequences.
The following demo is running in gnome-terminal:
And in Windows Terminal:
Note some characters are printed with an extra space following them. This seems to be a problem with terminals currently assuming the new Unicode Symbols for Legacy Computing to be double-width, not a bug in the output of cbmbasic itself.
Basic code for the demo:
10 REM SHOW COLORS
20 FOR I = 0 TO 15
30 READ V
40 PRINT CHR$(V);CHR$(113);STR$(V)
50 NEXT
60 DATA 144,5,28,159,156,30,31,158,129,149,150,151,152,153,154,155
70 REM SHOW CHARACTERS
80 FOR I = 1 TO 8
90 READ X
100 PRINT X,
110 FOR Y = 0 TO 15
120 PRINT " ";CHR$(X+Y);
130 NEXT
140 PRINT
150 NEXT
160 DATA 32,48,64,80,96,112,160,176
Extra details:
To work correctly in conhost as well, cbmbasic should call SetConsoleMode(h, ENABLE_PROCESSED_OUTPUT | ENABLE_VIRTUAL_TERMINAL_PROCESSING);
to enable VT as it is off by default (it just happens that Windows Terminal always processes VT, regardless of this flag).
The opposite conversion from Unicode to PETSCII should be performed on input in CHRIN.
Hi,
Is it possible to run the classic
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
program?I'm probably splitting hairs with this one, since I assume it's not really straight forward to emulate the graphical PETSCII characters across all the platforms. I was more curious than anything else: