mist64 / cbmbasic

cbmbasic, a portable version of Commodore's version of Microsoft BASIC 6502 as found on the Commodore 64
446 stars 66 forks source link

Is It Possible to Implement 10 PRINT #7

Open heliomass opened 4 years ago

heliomass commented 4 years ago

Hi,

Is it possible to run the classic 10 PRINT CHR$(205.5+RND(1)); : GOTO 10 program?

I'm probably splitting hairs with this one, since I assume it's not really straight forward to emulate the graphical PETSCII characters across all the platforms. I was more curious than anything else:

LIST

10 FOR I=0 TO 1000:PRINTSTR$(I)+" "+CHR$(I)+"  ";:NEXT
READY.
RUN
 0    1    2    3    4    5    6    7    8   9     10    11
                                                               12
                                                                     13
   14    15    16    17    18    19    20    21    22    23    24    25    26    27 8    29    30    31    32     33 !   34 "   35 #   36 $   37 %   38 &   39 '   40 (   41 )   42 *   43 +   44 ,   45 -   46 .   47 /   48 0   49 1   50 2   51 3   52 4   53 5   54 6   55 7   56 8   57 9   58 :   59 ;   60 <   61 =   62 >   63 ?   64 @   65 A   66 B   67 C   68 D   69 E   70 F   71 G   72 H   73 I   74 J   75 K   76 L   77 M   78 N   79 O   80 P   81 Q   82 R   83 S   84 T   85 U   86 V   87 W   88 X   89 Y   90 Z   91 [   92 \   93 ]   94 ^   95 _   96 `   97 a   98 b   99 c   100 d   101 e   102 f   103 g
104 h   105 i   106 j   107 k   108 l   109 m   110 n   111 o   112 p   113 q   114 r   115 s   116 t   117 u   118 v   119 w   120 x   121 y   122 z   123 {   124 |   125 }   126 ~   127    128    129    130    131    132    133    134    135
 136    137    138    139    140    141    142    143    144    145    146    147    148    149    150    151    152    153    154    155    156    157    158    159    160    161    162    163    164    165    166    167    168    169    170    171    172    173    174    175    176    177    178    179    180    181    182    183    184    185    186    187    188    189    190    191    192    193    194    195    196    197    198    199    200    201    202    203    204    205    206    207    208    209    210    211    212    213    214    215    216    217    218    219    220    221    222    223    224    225    226    227    228    229    230    231    232    233    234    235    236    237    238    239    240    241    242    243    244    245    246    247    248    249    250    251    252    253    254    255
?ILLEGAL QUANTITY  ERROR IN 10
PhMajerus commented 2 years ago

Unicode recently added a block containing characters for legacy computing to support representing most old 8-bit character sets in Unicode (https://en.wikipedia.org/wiki/Symbols_for_Legacy_Computing). This means a function like CHR$ could simply be a lookup table that returns the corresponding Unicode character that matches the specified PETSCII value. It wouldn't support all control characters (they are all mapped to U+FFFD below), but all graphical characters would work fine if a font that contains those characters is used.

Here is the code points mapping array for the Commodore 64: [0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0x85,0xE,0xFFFD,0xFFFD,0x84,0xFFFD,0xFFFD,0x8,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0x1B,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0x20,0x21,0x22,0x23,0x24,0x25,0x26,0x27,0x28,0x29,0x2A,0x2B,0x2C,0x2D,0x2E,0x2F,0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x3A,0x3B,0x3C,0x3D,0x3E,0x3F,0x40,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x4A,0x4B,0x4C,0x4D,0x4E,0x4F,0x50,0x51,0x52,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5A,0x5B,0xA3,0x5D,0x2191,0x2190,0x1FB79,0x2660,0x1FB72,0x1FB78,0x1FB77,0x1FB76,0x1FB7A,0x1FB71,0x1FB74,0x256E,0x2570,0x256F,0x1FB7C,0x2572,0x2571,0x1FB7D,0x1FB7E,0x25CF,0x1FB7B,0x2665,0x1FB70,0x256D,0x2573,0x25CB,0x2663,0x1FB75,0x2666,0x253C,0x1FB8C,0x2502,0x3C0,0x25E5,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xA,0xF,0xFFFD,0xFFFD,0x8D,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xFFFD,0xA0,0x258C,0x2584,0x2594,0x2581,0x258F,0x2592,0x2595,0x1FB8F,0x25E4,0x1FB87,0x251C,0x2597,0x2514,0x2510,0x2582,0x250C,0x2534,0x252C,0x2524,0x258E,0x258D,0x1FB88,0x1FB82,0x1FB83,0x2583,0x1FB7F,0x2596,0x259D,0x2518,0x2598,0x259A,0x1FB79,0x2660,0x1FB72,0x1FB78,0x1FB77,0x1FB76,0x1FB7A,0x1FB71,0x1FB74,0x256E,0x2570,0x256F,0x1FB7C,0x2572,0x2571,0x1FB7D,0x1FB7E,0x25CF,0x1FB7B,0x2665,0x1FB70,0x256D,0x2573,0x25CB,0x2663,0x1FB75,0x2666,0x253C,0x1FB8C,0x2502,0x3C0,0x25E5,0xA0,0x258C,0x2584,0x2594,0x2581,0x258F,0x2592,0x2595,0x1FB8F,0x25E4,0x1FB87,0x251C,0x2597,0x2514,0x2510,0x2582,0x250C,0x2534,0x252C,0x2524,0x258E,0x258D,0x1FB88,0x1FB82,0x1FB83,0x2583,0x1FB7F,0x2596,0x259D,0x2518,0x2598,0x3C0]

CHR$ just needs to pick the value at the requested index from the array and return it as a string according to the encoding of strings in your environment. Beware many are outside of the Unicode Basic Multilingual Plane and will require a surrogate pair in UTF-16, or proper multibyte encoding in UTF-8 up to 4 bytes)

Here's what they look like with the unscii font (http://viznut.fi/unscii/): image

Other mapping arrays could be used to support different character sets, for example a CHR$MODE global variable or function could be used to select between PET, C64, TRS-80, Apple2, ...

PhMajerus commented 2 years ago

As a follow-up with a quick and dirty change of the CHROUT function, so the mapping is performed on output, not in the internal string representation.

image

image

These two screenshots are cbmbasic running in Windows Terminal with the unscii 8 font.

And running in gnome-terminal: image Note this one is still the Windows x64 binary, just running through WSL in a Linux gnome-terminal to show the PETSCII character set to Unicode mapping works.

PhMajerus commented 2 years ago

Here's a more complete fix with an array of string that contains the Unicode characters as well as most control codes as VT escape sequences. This array uses UTF-16 to take advantage of the Win32 WriteConsoleW function, which makes it independent of the 8-bit code page in use. A Linux build should probably use equivalent UTF-8 encoding instead.

The array that maps PETSCII to Unicode and VT sequences: static LPCWSTR apszCharsMap[256] = { L"",L"",L"",L"\uFFFD",L"",L"\x1B[38;5;231m",L"",L"",L"\uFFFD",L"\uFFFD",L"",L"",L"",L"\r\n",L"\x0E",L"",L"",L"\x1B[B",L"\x1B[7m",L"\x1B[H",L"\b",L"",L"",L"",L"",L"",L"",L"\x1B",L"\x1B[38;5;88m",L"\x1B[C",L"\x1B[38;5;34m",L"\x1B[38;5;20m",L" ",L"!",L"\"",L"#",L"$",L"%",L"&",L"'",L"(",L")",L"*",L"+",L",",L"-",L".",L"/",L"0",L"1",L"2",L"3",L"4",L"5",L"6",L"7",L"8",L"9",L":",L";",L"<",L"=",L">",L"?",L"@",L"A",L"B",L"C",L"D",L"E",L"F",L"G",L"H",L"I",L"J",L"K",L"L",L"M",L"N",L"O",L"P",L"Q",L"R",L"S",L"T",L"U",L"V",L"W",L"X",L"Y",L"Z",L"[",L"\xA3",L"]",L"\u2191",L"\u2190",L"\U0001FB79",L"\u2660",L"\U0001FB72",L"\U0001FB78",L"\U0001FB77",L"\U0001FB76",L"\U0001FB7A",L"\U0001FB71",L"\U0001FB74",L"\u256E",L"\u2570",L"\u256F",L"\U0001FB7C",L"\u2572",L"\u2571",L"\U0001FB7D",L"\U0001FB7E",L"\u25CF",L"\U0001FB7B",L"\u2665",L"\U0001FB70",L"\u256D",L"\u2573",L"\u25CB",L"\u2663",L"\U0001FB75",L"\u2666",L"\u253C",L"\U0001FB8C",L"\u2502",L"\u03C0",L"\u25E5",L"",L"\x1B[38;5;173m",L"",L"\uFFFD",L"",L"\uFFFD",L"\uFFFD",L"\uFFFD",L"\uFFFD",L"\uFFFD",L"\uFFFD",L"\uFFFD",L"\uFFFD",L"\x85",L"\x0F",L"",L"\x1B[38;5;16m",L"\x1B[A",L"\x1B[27m",L"\x1B[2J\x1B[H",L"\uFFFD",L"\x1B[38;5;94m",L"\x1B[38;5;204m",L"\x1B[38;5;59m",L"\x1B[38;5;102m",L"\x1B[38;5;155m",L"\x1B[38;5;33m",L"\x1B[38;5;145m",L"\x1B[38;5;127m",L"\x1B[D",L"\x1B[38;5;227m",L"\x1B[38;5;123m",L"\xA0",L"\u258C",L"\u2584",L"\u2594",L"\u2581",L"\u258F",L"\u2592",L"\u2595",L"\U0001FB8F",L"\u25E4",L"\U0001FB87",L"\u251C",L"\u2597",L"\u2514",L"\u2510",L"\u2582",L"\u250C",L"\u2534",L"\u252C",L"\u2524",L"\u258E",L"\u258D",L"\U0001FB88",L"\U0001FB82",L"\U0001FB83",L"\u2583",L"\U0001FB7F",L"\u2596",L"\u259D",L"\u2518",L"\u2598",L"\u259A",L"\U0001FB79",L"\u2660",L"\U0001FB72",L"\U0001FB78",L"\U0001FB77",L"\U0001FB76",L"\U0001FB7A",L"\U0001FB71",L"\U0001FB74",L"\u256E",L"\u2570",L"\u256F",L"\U0001FB7C",L"\u2572",L"\u2571",L"\U0001FB7D",L"\U0001FB7E",L"\u25CF",L"\U0001FB7B",L"\u2665",L"\U0001FB70",L"\u256D",L"\u2573",L"\u25CB",L"\u2663",L"\U0001FB75",L"\u2666",L"\u253C",L"\U0001FB8C",L"\u2502",L"\u03C0",L"\u25E5",L"\xA0",L"\u258C",L"\u2584",L"\u2594",L"\u2581",L"\u258F",L"\u2592",L"\u2595",L"\U0001FB8F",L"\u25E4",L"\U0001FB87",L"\u251C",L"\u2597",L"\u2514",L"\u2510",L"\u2582",L"\u250C",L"\u2534",L"\u252C",L"\u2524",L"\u258E",L"\u258D",L"\U0001FB88",L"\U0001FB82",L"\U0001FB83",L"\u2583",L"\U0001FB7F",L"\u2596",L"\u259D",L"\u2518",L"\u2598",L"\u03C0" }; (You can check out https://github.com/PhMajerus/AXSH.Library/blob/master/Functions/CBMChr.vbs for an easier to read version of this mapping)

In CHROUT, both putchar(A); can be replaced with

{
    LPCWSTR pszChar = apszCharsMap[A];
    WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), pszChar, wcslen(pszChar), NULL, NULL);
}

And all cases handling special characters, except '"' can be removed as this array contains the equivalent VT sequences.

The following demo is running in gnome-terminal: image

And in Windows Terminal: image

Note some characters are printed with an extra space following them. This seems to be a problem with terminals currently assuming the new Unicode Symbols for Legacy Computing to be double-width, not a bug in the output of cbmbasic itself.

Basic code for the demo:

10 REM SHOW COLORS
20 FOR I = 0 TO 15
30 READ V
40 PRINT CHR$(V);CHR$(113);STR$(V)
50 NEXT
60 DATA 144,5,28,159,156,30,31,158,129,149,150,151,152,153,154,155
70 REM SHOW CHARACTERS
80 FOR I = 1 TO 8
90 READ X
100 PRINT X,
110 FOR Y = 0 TO 15
120 PRINT " ";CHR$(X+Y);
130 NEXT
140 PRINT
150 NEXT
160 DATA 32,48,64,80,96,112,160,176

Extra details: To work correctly in conhost as well, cbmbasic should call SetConsoleMode(h, ENABLE_PROCESSED_OUTPUT | ENABLE_VIRTUAL_TERMINAL_PROCESSING); to enable VT as it is off by default (it just happens that Windows Terminal always processes VT, regardless of this flag). The opposite conversion from Unicode to PETSCII should be performed on input in CHRIN.