Easier way to bind host controller to any MSX key

See for instance this: https://www.msx.org/forum/msx-talk/openmsx/need-a-little-help-binding-joystick-buttons

It's quite a mess/trouble to map host PC input (e.g. from a controller) to an MSX keyboard key. This is very useful for cases like Metal Gear or other games where you need keyboard keys to get into menus (like F1, F2) or want to press the STOP key or another key to pause. It requires a lot of complex bindings and knowledge of the used keyboard matrix.

Ideally, as a user, you would want to say something like: "map joy0 button2 to MSX F2 key". I would even like to make an OSD menu entry to configure this stuff. But for that, we need some ingredients:

for each keyboard matrix entry, a name of that key.
- This depends on the model of the keyboard of course, which is defined in the keyboard_type tag in the machine XML config file. This tag is currently used to find the proper unicodemap file, which maps unicode characters to keypresses, but has no information about the key.
- Do note that a key and the produced character is completely not the same. Multiple characters can be produced with a single key (depending on CAPS LOCK or modifiers, or IME). Still, in general the character you get in default mode when hitting a key is a nice first indication for the name. Some keys do not produce any characters (e.g. STOP, SELECT, etc.) or it depends on the context (F1, F2, ...)
it is necessary to be able to query a list of MSX keys (of the current model) to make a selection list for the user
there must be a way to query the current mappings to edit them. If we'd use the bind command to realize the mapping (as is currently done with a set of 2 binds for both press and release), just listing the binds wouldn't be helpful. A workaround (thanks Wouter) would be to put the binds in a named layer and query the binds of that layer.

Note that I'm putting only host controller events as input into scope here. But the scope could be widened by also allowing to map easily host keyboard events to MSX keyboard keys. In that case, it would also be necessary to be able to query all host keyboard keys to generate a selection menu for these.

How can we get further with this, folks? Please help.

This is just not going to help enough:

 proc bind_msx_key { key matrix_row matrix_bit } {
        bind $key "keymatrixdown $matrix_row [expr 1 << $matrix_bit]"
        bind $key,release "keymatrixup $matrix_row [expr 1 << $matrix_bit]"
}

;-)

For the above first 2 bullets, we need to define which keys are available for a certain MSX machine and to which keyboard matrix position they correspond. It must be easy to recognize them and they must be referable from a (new kind of) bind command, to assign toggling events to presses of such keys.

Note that the same information could also be used for other things, like:

the use case of the OSD keyboard. That scripts hardcodes the keyboard matrix independent of the used machine and is thus not working (properly) for machines without international keymatrix layout (try for instance a Japanese machine, or the Philips VG 8010, or a German MSX like the Sony HB-F700D). Note that the proposed information is still not enough to make the OSD keyboard perfect, as it doesn't cover the physical positions of the keys in the keyboard.
replacing the keyboard matrix entries in the unicodemap files with actual key names, which makes it much easier to understand. (The matrix positions would then have to be looked up from this new file). It would also allow us to remove the DEADKEY stuff from the unicodemap file, as it would be redundant. And we could remove the hardcoded positions of modifier keys from the openMSX C++ code.

To make this information available, we could for instance let each machine configuration file to refer to a file that contains this information.

For instance, for a Philips NMS 8250 it could look as follows, where the first column contains the identical to the unicodemap files, the 2nd column contains the main symbol (i.e. without any modifiers) of the key (to easily recognize it and to be used as a name to refer it to in commands) and the last column is an optional textual name of the key (perhaps useful, not sure yet):

00, 0, ZERO
01, 1, ONE
02, 2, TWO
03, 3, THREE
04, 4, FOUR
05, 5, FIVE
06, 6, SIX
07, 7, SEVEN
10, 8, EIGHT
11, 9, NINE
12, -, MINUS
13, =, EQUALS
14, \, BACKSLASH
15, [, LEFT SQUARE BRACKET
16, ], RIGHT SQUARE BRACKET
17, ;, SEMICOLON
20, ,, APOSTROPHE
21, `, GRAVE ACCENT
22, ,, COMMA
23, ., FULL STOP
24, /, SLASH
25, DEAD, `'^" ACCENTS
26, A
27, B
30, C
31, D
32, E
33, F
34, G
35, H
36, I
37, J
40, K
41, L
42, M
43, N
44, O
45, P
46, Q
47, R
50, S
51, T
52, U
53, V
54, W
55, X
56, Y
57, Z
60, SHIFT
61, CTRL
62, GRAPH
63, CAPS
64, CODE
65, F1
66, F2
67, F3
70, F4
71, F5
72, ESC
73, TAB
74, BS
75, SELECT
76, RETURN
80, SPACE
81, HOME
82, INS
83, DEL
84, ←, LEFT
85, ↑, UP
86, ↓, DOWN
87, →, RIGHT
90, NUM*, NUMPAD ASTERISK
91, NUM+, NUMPAD PLUS
92, NUM/, NUMPAD SLASH
93, NUM0, NUMPAD ZERO
94, NUM1, NUMPAD ONE
95, NUM2, NUMPAD TWO
96, NUM3, NUMPAD THREE
97, NUM4, NUMPAD FOUR
a0, NUM5, NUMPAD FIVE
a1, NUM6, NUMPAD SIX
a2, NUM7, NUMPAD SEVEN
a3, NUM8, NUMPAD EIGHT
a4, NUM9, NUMPAD NINE
a5, NUM-, NUMPAD MINUS
a6, NUM,, NUMPAD COMMA
a7, NUM., NUMPAD FULL STOP

Certain keys are very important, but do not have constant names across MSX machines. E.g. the CODE key can also be named РУС on Russian MSX machines, or かな (KANA) on Japanese MSX machines. So perhaps we need an extra column for this, what do you think? (The openMSX C++ code should have a way to know which is the CODE/KANA key for the current machine.)

I also realize that these arrows are nice to show in a menu or so, but they are hard to type on a keyboard in a command... So, perhaps the 2nd column must be a name that is easy to type and fixed, the 3rd column the main symbol on the machine's keyboard and the 4th an optional short textual description (e.g. to be used as a tool tip in a GUI)? Please share your thoughts, especially @m9710797 and @mthuurne which whom I have discussed this before outside GitHub :)

On a side note, if you look at my second point (make the unicodemap files more readable) above, here's an illustration of what it could look like (just a sketch, looks like it could use some escaping, or a different separator, also for the + and I didn't include most control codes):

#region: International
#format: <MSXCHARCODE>, KEY[[+KEY]+...]
# <MSXCHARCODE>: MSX character code produced when typing the key combination
# specified in the following columns
# KEY[[+KEY]+...]:
# The 2nd column is a list of keys that need to be pressed and held to produce
# the character code of the MSXCHARCODE column. A + between the entries means
# they need to be held (or be active, in case of locking modifiers) together.
# The first item must be held first, then the next added, etc.
00, CTRL+SHIFT+2  # ^@
20, SPACE         # Space
21, SHIFT+1       # ! (EXCLAMATION MARK)
22, SHIFT+'       # " (QUOTATION MARK)
23, SHIFT+3       # # (NUMBER SIGN)
24, SHIFT+4       # $ (DOLLAR SIGN)
25, SHIFT+5       # % (PERCENT SIGN)
26, SHIFT+7       # & (AMPERSAND)
27, '             # ' (APOSTROPHE)
28, SHIFT+9       # ( (LEFT PARENTHESIS)
29, SHIFT+0       # ) (RIGHT PARENTHESIS)
2a, SHIFT+8       # * (ASTERISK)
2b, SHIFT+-       # + (PLUS SIGN)
2c, ,             # , (COMMA)
2d, -             # - (HYPHEN-MINUS)
2e, .             # . (FULL STOP)
2f, /             # / (SOLIDUS)
30, 0             # 0 (DIGIT ZERO)
31, 1             # 1 (DIGIT ONE)
32, 2             # 2 (DIGIT TWO)
33, 3             # 3 (DIGIT THREE)
34, 4             # 4 (DIGIT FOUR)
35, 5             # 5 (DIGIT FIVE)
36, 6             # 6 (DIGIT SIX)
37, 7             # 7 (DIGIT SEVEN)
38, 8             # 8 (DIGIT EIGHT)
39, 9             # 9 (DIGIT NINE)
3a, SHIFT+;       # : (COLON)
3b, ;             # ; (SEMICOLON)
3c, SHIFT++       # < (LESS-THAN SIGN)
3d, -             # = (EQUALS SIGN)
3e, SHIFT+.       # > (GREATER-THAN SIGN)
3f, SHIFT+/       # ? (QUESTION MARK)
40, SHIFT+2       # @ (COMMERCIAL AT)
41, SHIFT+A       # A (LATIN CAPITAL LETTER A)
42, SHIFT+B       # B (LATIN CAPITAL LETTER B)
43, SHIFT+C       # C (LATIN CAPITAL LETTER C)
44, SHIFT+D       # D (LATIN CAPITAL LETTER D)
45, SHIFT+E       # E (LATIN CAPITAL LETTER E)
46, SHIFT+F       # F (LATIN CAPITAL LETTER F)
47, SHIFT+G       # G (LATIN CAPITAL LETTER G)
48, SHIFT+H       # H (LATIN CAPITAL LETTER H)
49, SHIFT+I       # I (LATIN CAPITAL LETTER I)
4a, SHIFT+J       # J (LATIN CAPITAL LETTER J)
4b, SHIFT+K       # K (LATIN CAPITAL LETTER K)
4c, SHIFT+L       # L (LATIN CAPITAL LETTER L)
4d, SHIFT+M       # M (LATIN CAPITAL LETTER M)
4e, SHIFT+N       # N (LATIN CAPITAL LETTER N)
4f, SHIFT+O       # O (LATIN CAPITAL LETTER O)
50, SHIFT+P       # P (LATIN CAPITAL LETTER P)
51, SHIFT+Q       # Q (LATIN CAPITAL LETTER Q)
52, SHIFT+R       # R (LATIN CAPITAL LETTER R)
53, SHIFT+S       # S (LATIN CAPITAL LETTER S)
54, SHIFT+T       # T (LATIN CAPITAL LETTER T)
55, SHIFT+U       # U (LATIN CAPITAL LETTER U)
56, SHIFT+V       # V (LATIN CAPITAL LETTER V)
57, SHIFT+W       # W (LATIN CAPITAL LETTER W)
58, SHIFT+X       # X (LATIN CAPITAL LETTER X)
59, SHIFT+Y       # Y (LATIN CAPITAL LETTER Y)
5a, SHIFT+Z       # Z (LATIN CAPITAL LETTER Z)
5b, [             # [ (LEFT SQUARE BRACKET)
5c, \             # \ (REVERSE SOLIDUS)
5d, ]             # ] (RIGHT SQUARE BRACKET)
5e, SHIFT+6       # ^ (CIRCUMFLEX ACCENT)
5f, SHIFT+-       # _ (LOW LINE)
60, `             # ` (GRAVE ACCENT)
61, A             # a (LATIN SMALL LETTER A)
62, B             # b (LATIN SMALL LETTER B)
63, C             # c (LATIN SMALL LETTER C)
64, D             # d (LATIN SMALL LETTER D)
65, E             # e (LATIN SMALL LETTER E)
66, F             # f (LATIN SMALL LETTER F)
67, G             # g (LATIN SMALL LETTER G)
68, H             # h (LATIN SMALL LETTER H)
69, I             # i (LATIN SMALL LETTER I)
6a, J             # j (LATIN SMALL LETTER J)
6b, K             # k (LATIN SMALL LETTER K)
6c, L             # l (LATIN SMALL LETTER L)
6d, M             # m (LATIN SMALL LETTER M)
6e, N             # n (LATIN SMALL LETTER N)
6f, O             # o (LATIN SMALL LETTER O)
70, P             # p (LATIN SMALL LETTER P)
71, Q             # q (LATIN SMALL LETTER Q)
72, R             # r (LATIN SMALL LETTER R)
73, S             # s (LATIN SMALL LETTER S)
74, T             # t (LATIN SMALL LETTER T)
75, U             # u (LATIN SMALL LETTER U)
76, V             # v (LATIN SMALL LETTER V)
77, W             # w (LATIN SMALL LETTER W)
78, X             # x (LATIN SMALL LETTER X)
79, Y             # y (LATIN SMALL LETTER Y)
7a, Z             # z (LATIN SMALL LETTER Z)
7b, SHIFT+[       # { (LEFT CURLY BRACKET)
7c, SHIFT+\       # | (VERTICAL LINE)
7d, SHIFT+]       # } (RIGHT CURLY BRACKET)
7e, SHIFT+`       # ~ (TILDE)
ad, SHIFT+CODE+1  # ¡ (INVERTED EXCLAMATION MARK)
9b, CODE+4        # ¢ (CENT SIGN)
9c, SHIFT+CODE+4  # £ (POUND SIGN)
9d, SHIFT+CODE+5  # ¥ (YEN SIGN)
bf, CODE+3        # § (SECTION SIGN)
a6, CODE+.        # ª (FEMININE ORDINAL INDICATOR)
ae, SHIFT+GRAPH+, # « (LEFT-POINTING DOUBLE ANGLE QUOTATION MARK)
aa, SHIFT+GRAPH+Y # ¬ (NOT SIGN)
f8, SHIFT+GRAPH+Z # ° (DEGREE SIGN)
f1, GRAPH+-       # ± (PLUS-MINUS SIGN)
fd, SHIFT+GRAPH+2 # ² (SUPERSCRIPT TWO)
e6, CODE+M        # µ (MICRO SIGN)
be, SHIFT+CODE+3  # ¶ (PILCROW SIGN)
fa, SHIFT+GRAPH+C # · (MIDDLE DOT)
a7, CODE+/        # º (MASCULINE ORDINAL INDICATOR)
af, SHIFT+GRAPH+. # » (RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK)
ac, GRAPH+1       # ¼ (VULGAR FRACTION ONE QUARTER)
ab, GRAPH+2       # ½ (VULGAR FRACTION ONE HALF)
ba, GRAPH+3       # ¾ (VULGAR FRACTION THREE QUARTERS)
a8, SHIFT+CODE+/  # ¿ (INVERTED QUESTION MARK)
b0, SHIFT+CODE+H  # Ã (LATIN CAPITAL LETTER A WITH TILDE)
8e, SHIFT+CODE+A  # Ä (LATIN CAPITAL LETTER A WITH DIAERESIS)
8f, SHIFT+CODE+,  # Å (LATIN CAPITAL LETTER A WITH RING ABOVE)
92, SHIFT+CODE+J  # Æ (LATIN CAPITAL LETTER AE)
80, SHIFT+CODE+9  # Ç (LATIN CAPITAL LETTER C WITH CEDILLA)
90, SHIFT+CODE+U  # É (LATIN CAPITAL LETTER E WITH ACUTE)
a5, SHIFT+CODE+N  # Ñ (LATIN CAPITAL LETTER N WITH TILDE)
b4, SHIFT+CODE+L  # Õ (LATIN CAPITAL LETTER O WITH TILDE)
99, SHIFT+CODE+F  # Ö (LATIN CAPITAL LETTER O WITH DIAERESIS)
9a, SHIFT+CODE+G  # Ü (LATIN CAPITAL LETTER U WITH DIAERESIS)
e1, CODE+7        # ß (LATIN SMALL LETTER SHARP S)
85, CODE+Z        # à (LATIN SMALL LETTER A WITH GRAVE)
a0, CODE+Y        # á (LATIN SMALL LETTER A WITH ACUTE)
83, CODE+Q        # â (LATIN SMALL LETTER A WITH CIRCUMFLEX)
b1, CODE+H        # ã (LATIN SMALL LETTER A WITH TILDE)
84, CODE+A        # ä (LATIN SMALL LETTER A WITH DIAERESIS)
86, CODE+,        # å (LATIN SMALL LETTER A WITH RING ABOVE)
91, CODE+J        # æ (LATIN SMALL LETTER AE)
87, CODE+9        # ç (LATIN SMALL LETTER C WITH CEDILLA)
8a, CODE+X        # è (LATIN SMALL LETTER E WITH GRAVE)
82, CODE+U        # é (LATIN SMALL LETTER E WITH ACUTE)
88, CODE+W        # ê (LATIN SMALL LETTER E WITH CIRCUMFLEX)
89, CODE+S        # ë (LATIN SMALL LETTER E WITH DIAERESIS)
8d, CODE+C        # ì (LATIN SMALL LETTER I WITH GRAVE)
a1, CODE+I        # í (LATIN SMALL LETTER I WITH ACUTE)
8c, CODE+E        # î (LATIN SMALL LETTER I WITH CIRCUMFLEX)
8b, CODE+D        # ï (LATIN SMALL LETTER I WITH DIAERESIS)
a4, CODE+N        # ñ (LATIN SMALL LETTER N WITH TILDE)
95, CODE+V        # ò (LATIN SMALL LETTER O WITH GRAVE)
a2, CODE+O        # ó (LATIN SMALL LETTER O WITH ACUTE)
93, CODE+R        # ô (LATIN SMALL LETTER O WITH CIRCUMFLEX)
b5, CODE+L        # õ (LATIN SMALL LETTER O WITH TILDE)
94, CODE+F        # ö (LATIN SMALL LETTER O WITH DIAERESIS)
f6, SHIFT+GRAPH+/ # ÷ (DIVISION SIGN)
97, CODE+B        # ù (LATIN SMALL LETTER U WITH GRAVE)
a3, CODE+P        # ú (LATIN SMALL LETTER U WITH ACUTE)
96, CODE+T        # û (LATIN SMALL LETTER U WITH CIRCUMFLEX)
81, CODE+G        # ü (LATIN SMALL LETTER U WITH DIAERESIS)
98, CODE+5        # ÿ (LATIN SMALL LETTER Y WITH DIAERESIS)
b2, SHIFT+CODE+K  # Ĩ (LATIN CAPITAL LETTER I WITH TILDE)
b3, CODE+K        # ĩ (LATIN SMALL LETTER I WITH TILDE)
b8, SHIFT+CODE+'  # Ĳ (LATIN CAPITAL LIGATURE IJ)
b9, CODE+'        # ĳ (LATIN SMALL LIGATURE IJ)
b6, SHIFT+CODE+;  # Ũ (LATIN CAPITAL LETTER U WITH TILDE)
b7, CODE+;        # ũ (LATIN SMALL LETTER U WITH TILDE)
9f, CODE+1        # ƒ (LATIN SMALL LETTER F WITH HOOK)
e2, SHIFT+CODE+8  # Γ (GREEK CAPITAL LETTER GAMMA)
d8, SHIFT+CODE+0  # Δ (GREEK CAPITAL LETTER DELTA)
e9, CODE+-        # Θ (GREEK CAPITAL LETTER THETA)
e4, SHIFT+CODE+`  # Σ (GREEK CAPITAL LETTER SIGMA)
e8, SHIFT+CODE+[  # Φ (GREEK CAPITAL LETTER PHI)
ea, SHIFT+CODE+]  # Ω (GREEK CAPITAL LETTER OMEGA)
e0, CODE+6        # α (GREEK SMALL LETTER ALPHA)
eb, CODE+0        # δ (GREEK SMALL LETTER DELTA)
e3, SHIFT+CODE+P  # π (GREEK SMALL LETTER PI)
e5, CODE+`        # σ (GREEK SMALL LETTER SIGMA)
e7, CODE+8        # τ (GREEK SMALL LETTER TAU)
da, CODE+]        # ω (GREEK SMALL LETTER OMEGA)
d9, CODE+2        # ‡ (DOUBLE DAGGER)
07, GRAPH+9       # • (BULLET)
bd, GRAPH+5       # ‰ (PER MILLE SIGN)
fc, SHIFT+GRAPH+3 # ⁿ (SUPERSCRIPT LATIN SMALL LETTER N)
9e, SHIFT+CODE+2  # ₧ (PESETA SIGN)
ed, CODE+[        # ∅ (EMPTY SET)
ee, CODE+-        # ∈ (ELEMENT OF)
f9, SHIFT+GRAPH+X # ∙ (BULLET OPERATOR)
fb, GRAPH+7       # √ (SQUARE ROOT)
ec, GRAPH+8       # ∞ (INFINITY)
ef, GRAPH+4       # ∩ (INTERSECTION)
bb, GRAPH+`       # ∽ (REVERSED TILDE)
f7, SHIFT+GRAPH+` # ≈ (ALMOST EQUAL TO)
f0, SHIFT+GRAPH+- # ≡ (IDENTICAL TO)
f3, GRAPH++       # ≤ (LESS-THAN OR EQUAL TO)
f2, GRAPH+.       # ≥ (GREATER-THAN OR EQUAL TO)
a9, SHIFT+GRAPH+R # ⌐ (REVERSED NOT SIGN)
f4, GRAPH+6       # ⌠ (TOP HALF INTEGRAL)
f5, SHIFT+GRAPH+6 # ⌡ (BOTTOM HALF INTEGRAL)
17, GRAPH+-       # ─ (BOX DRAWINGS LIGHT HORIZONTAL)
16, SHIFT+GRAPH+\ # │ (BOX DRAWINGS LIGHT VERTICAL)
18, GRAPH+R       # ┌ (BOX DRAWINGS LIGHT DOWN AND RIGHT)
19, GRAPH+Y       # ┐ (BOX DRAWINGS LIGHT DOWN AND LEFT)
1a, GRAPH+V       # └ (BOX DRAWINGS LIGHT UP AND RIGHT)
1b, GRAPH+N       # ┘ (BOX DRAWINGS LIGHT UP AND LEFT)
14, GRAPH+F       # ├ (BOX DRAWINGS LIGHT VERTICAL AND RIGHT)
13, GRAPH+H       # ┤ (BOX DRAWINGS LIGHT VERTICAL AND LEFT)
12, GRAPH+T       # ┬ (BOX DRAWINGS LIGHT DOWN AND HORIZONTAL)
11, GRAPH+B       # ┴ (BOX DRAWINGS LIGHT UP AND HORIZONTAL)
15, GRAPH+G       # ┼ (BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL)
1d, GRAPH+/       # ╱ (BOX DRAWINGS LIGHT DIAGONAL UPPER RIGHT TO LOWER LEFT)
1e, GRAPH+\       # ╲ (BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT)
1c, GRAPH+X       # ╳ (BOX DRAWINGS LIGHT DIAGONAL CROSS)
df, SHIFT+GRAPH+I # ▀ (UPPER HALF BLOCK)
c0, GRAPH+U       # ▂ (LOWER ONE QUARTER BLOCK)
dc, GRAPH+I       # ▄ (LOWER HALF BLOCK)
c2, GRAPH+O       # ▆ (LOWER THREE QUARTERS BLOCK)
db, GRAPH+P       # █ (FULL BLOCK)
c8, GRAPH+L       # ▊ (LEFT THREE QUARTERS BLOCK)
dd, GRAPH+K       # ▌ (LEFT HALF BLOCK)
c6, GRAPH+J       # ▎ (LEFT ONE QUARTER BLOCK)
de, SHIFT+GRAPH+K # ▐ (RIGHT HALF BLOCK)
d6, SHIFT+GRAPH+H # ▖ (QUADRANT LOWER LEFT)
d4, SHIFT+GRAPH+F # ▗ (QUADRANT LOWER RIGHT)
d3, SHIFT+GRAPH+N # ▘ (QUADRANT UPPER LEFT)
c1, SHIFT+GRAPH+D # ▚ (QUADRANT UPPER LEFT AND LOWER RIGHT)
d5, SHIFT+GRAPH+V # ▝ (QUADRANT UPPER RIGHT)
c7, GRAPH+D       # ▞ (QUADRANT UPPER RIGHT AND LOWER LEFT)
fe, SHIFT+GRAPH+A # ■ (BLACK SQUARE)
c4, GRAPH+A       # ▬ (BLACK RECTANGLE)
bc, GRAPH+C       # ◇ (WHITE DIAMOND)
09, GRAPH+0       # ○ (WHITE CIRCLE)
08, SHIFT+GRAPH+9 # ◘ (INVERSE BULLET)
0a, SHIFT+GRAPH+0 # ◙ (INVERSE WHITE CIRCLE)
01, GRAPH+[       # ☺ (WHITE SMILING FACE)
02, SHIFT+GRAPH+[ # ☻ (BLACK SMILING FACE)
0f, GRAPH+Z       # ☼ (WHITE SUN WITH RAYS)
0c, SHIFT+GRAPH+M # ♀ (FEMALE SIGN)
0b, GRAPH+M       # ♂ (MALE SIGN)
06, GRAPH+;       # ♠ (BLACK SPADE SUIT)
05, GRAPH+'       # ♣ (BLACK CLUB SUIT)
03, SHIFT+GRAPH+' # ♥ (BLACK HEART SUIT)
04, SHIFT+GRAPH+; # ♦ (BLACK DIAMOND SUIT)
0d, GRAPH+]       # ♪ (EIGHTH NOTE)
0e, SHIFT+GRAPH+] # ♫ (BEAMED EIGHTH NOTES)
10, SHIFT+GRAPH+G # ⟊ (VERTICAL BAR WITH HORIZONTAL STROKE)
cf, GRAPH+W       # 🭬 (LEFT TRIANGULAR ONE QUARTER BLOCK)
cd, GRAPH+E       # 🭭 (UPPER TRIANGULAR ONE QUARTER BLOCK)
d0, SHIFT+GRAPH+W # 🭮 (RIGHT TRIANGULAR ONE QUARTER BLOCK)
ce, SHIFT+GRAPH+E # 🭯 (LOWER TRIANGULAR ONE QUARTER BLOCK)
c3, SHIFT+GRAPH+O # 🮂 (UPPER ONE QUARTER BLOCK)
c5, SHIFT+GRAPH+U # 🮅 (UPPER THREE QUARTERS BLOCK)
c9, SHIFT+GRAPH+L # 🮇 (RIGHT ONE QUARTER BLOCK)
ca, SHIFT+GRAPH+J # 🮊 (RIGHT THREE QUARTERS BLOCK)
d7, SHIFT+GRAPH+P # 🮖 (INVERSE CHECKER BOARD FILL)
cc, GRAPH+Q       # 🮘 (UPPER LEFT TO LOWER RIGHT FILL)
cb, SHIFT+GRAPH+Q # 🮙 (UPPER RIGHT TO LOWER LEFT FILL)
d1, SHIFT+GRAPH+S # 🮚 (UPPER AND LOWER TRIANGULAR HALF BLOCK)
d2, GRAPH+S       # 🮛 (LEFT AND RIGHT TRIANGULAR HALF BLOCK)
1f, SHIFT+GRAPH+- # 🮯 (BOX DRAWINGS LIGHT HORIZONTAL WITH VERTICAL STROKE)

I'm going to try to simplify your proposal. Let's see how minimal we can get ...

Do we really need both a short and a long name? Let's start with a single name per key.
Do we, at this point, need to remove the hardcoded SHIFT, GRAPH, .. key-matrix positions? It's a good idea, but not really required for MSX. So I propose to postpone it. That way we don't (yet) have to care about the different names for the CODE/KANA/.. key.
I'm a bit concerned about the short key names in your proposal. E.g "SHIFT++" looks confusing. Also when you use names like [ ] ; " ... you'll quickly get escaping problems in Tcl commands. Therefor I propose to only allow alphanumeric characters in names. (And thus certainly no stuff like arrows).

The file format you propose works, but it's a bit verbose. Instead of explicitly enumerating each row/column, we can also do it implicitly via a table:

#      +- bit 7 -----+- bit 6 ----+- bit 5 ----+- bit 4 -+- bit 3 -+- bit 2 ----+- bit 1 ---+- bit 0 -------+
row  0 |7            |6           |5           |4        |3        |2           |1          |0              |
row  1 |SEMICOLON    |RIGHTBRACKET|LEFTBRACKET |BACKSLASH|EQUALS   |MINUS       |9          |8              |
row  2 |B            |A           |ACCENT      |SLASH    |PERIOD   |COMMA       |BACKQUOTE  |QUOTE          |
row  3 |J            |I           |H           |G        |F        |E           |D          |C              |
row  4 |R            |Q           |P           |O        |N        |M           |L          |K              |
row  5 |Z            |Y           |X           |W        |V        |U           |T          |S              |
row  6 |F3           |F2          |F1          |CODE     |CAPSLOCK |GRAPH       |CTRL       |SHIFT          |
row  7 |RETURN       |SELECT      |BACKSPACE   |STOP     |TAB      |ESCAPE      |F5         |F4             |
row  8 |RIGHT        |DOWN        |UP          |LEFT     |DELETE   |INSERT      |HOME       |SPACE          |
row  9 |NUMPAD_4     |NUMPAD_3    |NUMPAD_2    |NUMPAD_1 |NUMPAD_0 |NUMPAD_SLASH|NUMPAD_PLUS|NUMPAD_ASTERISK|
row 10 |NUMPAD_PERIOD|NUMPAD_COMMA|NUMPAD_MINUS|NUMPAD_9 |NUMPAD_8 |NUMPAD_7    |NUMPAD_6   |NUMPAD_5       |
row 11 |             |            |            |         |NO       |            |YES        |               |
#      +-------------+------------+------------+---------+---------+------------+-----------+---------------+

Such a table has the advantage that you can easily convert-from, or check-against, existing keyboard matrix documentation.
Note: this table is just an example. I based the names on SDL-key-names, but feel free to pick better names.

So the only thing this minimal proposal does is:

Give a (single, alphanumeric) name to each MSX key (actually not each key, there can be empty positions in the table).

But I think this is already sufficient for the bind stuff in the title of this issue.

This is of course possible, especially for a first version. Still some remarks:

this kind of table must be provided per MSX machine, as not all MSXes have the same keys and they are not always on the same positions in the matrix.
this kind of table does not allow for easy expansion with other (meta) data, like location of keys (for OSD keyboard) or the sign/symbol on the key (for recognizability or also for an OSD keyboard). So, would it get us stuck in a corner?
the (longer) names I chose were based on the unicode character description. But I don't really care as long as it's clear.
removing the hardcoded positions of modifier keys is indeed mostly useful to make our C++ Keyboard code more generic between MSX, SVI and e.g. Sega, i.e.. remove a lot of fixed/hardcoded stuff there.
on a sidenote, with this kind of file, we could also obsolete some tags, like <has_keypad>true</has_keypad>, has_yesno_keys>true</has_yesno_keys>, as these are just very specific exceptions on a 'generic keyboard' that is now emulated.

* this kind of table must be provided per MSX machine, as not all MSXes have the same keys and they are not always on the same positions in the matrix.
Do you already know how this information varies across machines? Or what the relation is with the existing unicodemap and VID files? I mean does the same unicodemap file always go together with the same key-names? Maybe there are only a limited amount of key-name configuration, and then machine-configs can refer to such a name-config? Or is it very specific to a machine and would it make more sense to include (partly duplicate??) this in each machine config.
* this kind of table does not allow for easy expansion with other (meta) data, like location of keys (for OSD keyboard) or the sign/symbol on the key (for recognizability or also for an OSD keyboard). So, would it get us stuck in a corner?

For (optional) extra data we could do something like this: key: CODE alt-name: KANA x-position:20 y-position:80 ... It depends on how dense this extra information is. If it's always present (e.g. all keys have x,y position) then having the main data in a table is less useful. On the other hand, for sparse data (e.g. only few keys have an alternate name) then it does make sense.

Also maybe the main table is the same for many machines, but the extra information (like key positions) varies more between machines. In that case it makes sense to keep that extra information separate.

Personally I see the xy-position for keys as a low priority item. But it's good to keep in mind for possible future extensions (as you say, to not get stuck in a corner). Specifically for the xy-positions I think it be better to keep it separate from the key names, and then we can always add it later.

* the (longer) names I chose were based on the unicode character description. But I don't really care as long as it's clear.

I also picked the names to not contain spaces. But if it are descriptions instead of alternate names, then this requirement disappears (but you shouldn't use such a description to identify the key, e.g. in a future 'bind' command).

* removing the hardcoded positions of modifier keys is indeed mostly useful to make our C++ Keyboard code more generic between MSX, SVI and e.g. Sega, i.e.. remove a lot of fixed/hardcoded stuff there.

Ok, then I propose to postpone this till later.

* on a sidenote, with this kind of file, we could also obsolete some tags, like `<has_keypad>true</has_keypad>`, `has_yesno_keys>true</has_yesno_keys>`, as these are just very specific exceptions on a 'generic keyboard' that is now emulated.

Yes and no. You could derive "haskeypad" from the above information (e.g. from the table), but only if you either assume that the MSX numpad keys occupy row 9 and 10, or if you assume that their name starts with "NUMPAD". Similar for "has_yesno_keys". If eventually the goal is to get rid of these kind of assumption (like you hinted at for SHIFT, CODE), then this may not be the best approach.

Personally, I think we're making an MSX emulator, and then it's fine to make some MSX specific assumptions.

this kind of table must be provided per MSX machine, as not all MSXes have the same keys and they are not always on the same positions in the matrix.

Do you already know how this information varies across machines? Or what the relation is with the existing unicodemap and VID files? I mean does the same unicodemap file always go together with the same key-names? Maybe there are only a limited amount of key-name configuration, and then machine-configs can refer to such a name-config? Or is it very specific to a machine and would it make more sense to include (partly duplicate??) this in each machine config.

Good questions and points. There is of course a strong relation to the unicodemap files, as they also contain key matrix entries. But only for keys that produce a character. That means basically that info about rows 0-5 will (partly) overlap with the info from these files. As far as I know, most variation is in these rows, though. So, probably (most?) of the key names will be fixed per unicodemap file. Of course it is possible that we have overlooked variations that do not fit the unicodemap files. Also, there are some MSXes with extra keys, like the Pioneer PX-7. As these do not produce characters, they're not in the unicodemap file. I'm not sure about the keys on the SVI-328.

My starting point would be to generate a row0-5 table based on the unicodemap file and fill up the rest with the standard rows and the extras belonging to that machine, which can already be derived from our XML files (e.g. numpad, YES/NO keys).

For (optional) extra data we could do something like this: key: CODE alt-name: KANA x-position:20 y-position:80 ... It depends on how dense this extra information is. If it's always present (e.g. all keys have x,y position) then having the main data in a table is less useful. On the other hand, for sparse data (e.g. only few keys have an alternate name) then it does make sense.

The position stuff would be there for all keys of course, as you already said. Well, you can see some examples of extra data in my example list in a previous post.

Personally I see the xy-position for keys as a low priority item. But it's good to keep in mind for possible future extensions (as you say, to not get stuck in a corner). Specifically for the xy-positions I think it be better to keep it separate from the key names, and then we can always add it later.

I agree it's low priority. Having the correct key matrix position for keys is much more important.

the (longer) names I chose were based on the unicode character description. But I don't really care as long as it's clear.

I also picked the names to not contain spaces. But if it are descriptions instead of alternate names, then this requirement disappears (but you shouldn't use such a description to identify the key, e.g. in a future 'bind' command).

Indeed, that's why I also made a distinction between a 'name for in a bind command' and presentation names 'to recognize the key'.

removing the hardcoded positions of modifier keys is indeed mostly useful to make our C++ Keyboard code more generic between MSX, SVI and e.g. Sega, i.e.. remove a lot of fixed/hardcoded stuff there.

Ok, then I propose to postpone this till later.

OK.

on a sidenote, with this kind of file, we could also obsolete some tags, like <has_keypad>true</has_keypad>, has_yesno_keys>true</has_yesno_keys>, as these are just very specific exceptions on a 'generic keyboard' that is now emulated.

Yes and no. You could derive "haskeypad" from the above information (e.g. from the table), but only if you either assume that the MSX numpad keys occupy row 9 and 10, or if you assume that their name starts with "NUMPAD". Similar for "has_yesno_keys". If eventually the goal is to get rid of these kind of assumption (like you hinted at for SHIFT, CODE), then this may not be the best approach.

Yes, of course, but it depends on what you use the information from these tags for. If we just need to know which keymatrix bits can be toggled for the current machine, the information from that table would be enough. And as far as I know, that is their purpose (but please correct me if I'm wrong).

Personally, I think we're making an MSX emulator, and then it's fine to make some MSX specific assumptions.

Sure, but we're also supporting a few other systems. If we can reasonably avoid it, let's keep things configurable in a file instead of hardcoded, to more easily support all hardware variations we come across and want to emulate in openMSX.

Do you already know how this information varies across machines? ...

Good questions and points. There is of course a strong relation to the unicodemap files, as they also contain key matrix entries. But only for keys that produce a character. That means basically that info about rows 0-5 will (partly) overlap with the info from these files. As far as I know, most variation is in these rows, though. So, probably (most?) of the key names will be fixed per unicodemap file. Of course it is possible that we have overlooked variations that do not fit the unicodemap files. Also, there are some MSXes with extra keys, like the Pioneer PX-7. As these do not produce characters, they're not in the unicodemap file. I'm not sure about the keys on the SVI-328.

I think it would be good to generate some concrete examples for various machines. So that we get a better understanding for how to organize this information. E.g. should the unicodemap-file point to a key-name-file? Or the other way around? Or should the machine config point to a unicodemap-file (like it already does now) and in addition also to a key-name-file?

... Well, you can see some examples of extra data in my example list in a previous post.

Do you mean the description of the keys? Or did I overlook something?

Indeed, that's why I also made a distinction between a 'name for in a bind command' and presentation names 'to recognize the key'.

My hope is that the name by itself is descriptive enough so that we don't need both.

Good questions and points. There is of course a strong relation to the unicodemap files, as they also contain key matrix entries. But only for keys that produce a character. That means basically that info about rows 0-5 will (partly) overlap with the info from these files. As far as I know, most variation is in these rows, though. So, probably (most?) of the key names will be fixed per unicodemap file. Of course it is possible that we have overlooked variations that do not fit the unicodemap files. Also, there are some MSXes with extra keys, like the Pioneer PX-7. As these do not produce characters, they're not in the unicodemap file. I'm not sure about the keys on the SVI-328.

I think it would be good to generate some concrete examples for various machines. So that we get a better understanding for how to organize this information. E.g. should the unicodemap-file point to a key-name-file? Or the other way around? Or should the machine config point to a unicodemap-file (like it already does now) and in addition also to a key-name-file?

A good example is this wiki article: https://www.msx.org/wiki/Keyboard_Matrices Although probably not fully correct, it gives quite an impression on what to think about. And yes, I'm also wondering what would be the best way to organize the information, so indeed, where to put which file and which reference to what. An option which I was thinking about in the beginning is to add the information to the unicodemap files itself, which mostly means that it must be extended with lines that do not yield characters. Having it separately is quite some duplication with these files. (That's why I at first proposed to split everything up, to avoid this duplication.)

I don't yet have a clearly good idea about this. Maybe @mthuurne has some ideas?

... Well, you can see some examples of extra data in my example list in a previous post.

Do you mean the description of the keys? Or did I overlook something?

I added a column for a description in the example I gave for the 8250.

Indeed, that's why I also made a distinction between a 'name for in a bind command' and presentation names 'to recognize the key'.

My hope is that the name by itself is descriptive enough so that we don't need both.

Yes, I think we could assume that for now.

OK, to make some progress, I'm working on a script that outputs the matrix for all machines, like this:

rows 0-5 are taken from the unicodemap file that belongs to the machine
rows 6-8 are fixed (for MSX machines)
rows 9-10 are fixed (for MSX machines that have <has_keypad>true</has_keypad> in their XML file)
row 11 is fixed (for MSX machines that that a <has_yesno_keys>true</has_yesno_keys> in their XML file)
output is tab separated
None means, no key is mapped to this matrix position

Example for turboR:

7       6       5       4       3       2       1       0
;       [       @       \       ^       -       9       8
b       a       None    /       .       ,       ]       :
j       i       h       g       f       e       d       c
r       q       p       o       n       m       l       k
z       y       x       w       v       u       t       s
F3      F2      F1      CODE    CAPS    GRAPH   CTRL    SHIFT
RETURN  SELECT  BS      STOP    TAB     ESC     F5      F4
RIGHT   DOWN    UP      LEFT    DEL     INS     HOME    SPACE
NUM4    NUM3    NUM2    NUM1    NUM0    NUM/    NUM+    NUM*
NUM.    NUM,    NUM-    NUM9    NUM8    NUM7    NUM6    NUM5
None    None    None    None    Cancel  None    Execute None
None    None    None    None    None    None    None    None

Example for German MSX1 without numpad:

7       6       5       4       3       2       1       0
ö       +       ü       <       DEADKEY1        ß       9       8
b       a       None    -       .       ,       #       ä
j       i       h       g       f       e       d       c
r       q       p       o       n       m       l       k
y       z       x       w       v       u       t       s
F3      F2      F1      CODE    CAPS    GRAPH   CTRL    SHIFT
RETURN  SELECT  BS      STOP    TAB     ESC     F5      F4
RIGHT   DOWN    UP      LEFT    DEL     INS     HOME    SPACE
None    None    None    None    None    None    None    None
None    None    None    None    None    None    None    None
None    None    None    None    None    None    None    None
None    None    None    None    None    None    None    None

Here's a next version, with escaping-hell improved and more names like your example... what to do with keys that are ß or ä? Should we escape them? (Well, the ß came out as SS now, using toupper, so that's fine I guess, but Ä, Ö, Ü.... not.)

Example turboR:

7   6   5   4   3   2   1   0
SEMICOLON   LEFTBRACKET AT  BACKSLASH   CARET   MINUS   9   8
B   A   BLANK   SLASH   PERIOD  COMMA   RIGHTBRACKET    COLON
J   I   H   G   F   E   D   C
R   Q   P   O   N   M   L   K
Z   Y   X   W   V   U   T   S
F3  F2  F1  CODE    CAPSLOCK    GRAPH   CTRL    SHIFT
RETURN  SELECT  BACKSPACE   STOP    TAB ESCAPE  F5  F4
RIGHT   DOWN    UP  LEFT    DELETE  INSERT  HOME    SPACE
NUMPAD_4    NUMPAD_3    NUMPAD_2    NUMPAD_1    NUMPAD_0    NUMPAD_SLASH    NUMPAD_PLUS NUMPAD_ASTERISK
NUMPAD_PERIOD   NUMPAD_COMMA    NUMPAD_MINUS    NUMPAD_9    NUMPAD_8    NUMPAD_7    NUMPAD_6    NUMPAD_5
None    None    None    None    CANCEL  None    EXECUTE None
None    None    None    None    None    None    None    None

And German (still as it was):

7   6   5   4   3   2   1   0
Ö   PLUS    Ü   LESS    DEADKEY1    SS  9   8
B   A   None    MINUS   PERIOD  COMMA   HASH    Ä
J   I   H   G   F   E   D   C
R   Q   P   O   N   M   L   K
Y   Z   X   W   V   U   T   S
F3  F2  F1  CODE    CAPSLOCK    GRAPH   CTRL    SHIFT
RETURN  SELECT  BACKSPACE   STOP    TAB ESCAPE  F5  F4
RIGHT   DOWN    UP  LEFT    DELETE  INSERT  HOME    SPACE

Some more remarks, as taken from IRC:

as far as I can tell, there will be 1 matrix per unicodemap file for rows 0-5. I can't think of any duplicates then, and there will be no extra ones needed (without the need of an extra unicodemap file).
this means, the information for rows 0-5 could be also added to these files, or linked from there.
as said before, rows 6-8 seem to be fixed for all MSX keyboards (but NOT SVI or Sega...)
and rows 9-10 as well, if present, governed by the current has_keypad tag.
and row 11 as well (turboR), as governed by the current has_yesno_keys tag.
idea: any deviations from these standards could be added to the machine's XML file itself with some syntax. Example: Pioneer PX-7 keys.
the JIS keyboard has a blank key on row 2 bit 5, which becomes _ when shifted. This is an exception, but if we just give the name of the 'main' character of the key (so without modifiers), "BLANK" seems to be reasonable. See example above where I applied that.

Please see this file for some more examples. I haven't checked all of them yet, some might need extra work for escaping. keymatrix.zip

Here's an updated file in which all key names should be easy to specify with a simple 0-9A-Z_ string. keymatrix.zip

Here is the very hacky script I used to create these files: extract_matrix.py.txt

In the background, @m9710797 and me have been discussing what to do for this ticket. This comment is meant to summarize the outcome and create a (more or less) ordered TODO list of the actions to perform. This comment is a WIP!

Some conclusions on requirements:

all three keyboard mapping modes have their own strengths, from high to low level they should in principle be like this:
- CHARACTER: when there is a unicode character to be associated with the host keyboard input, make sure the MSX keyboard matrix is producing the same character. So, this is a mostly automatic mapping from host input to MSX input. If there is no unicode character, fallback to the KEY mode mechanism, see next bullet.
- KEY: a mapping between host key codes to MSX key names. The key names are then translated (via the information we already added in the branch) to keyboard matrix positions.
- POSITIONAL: a mapping between host key scan codes and MSX key names (and with these back to keyboard matrix positions like in the KEY mode). This means the input-side is positional. But it would also be nice to make the MSX-side positional. For that special MSX key names could be used that just point out a matrix position (like: MATRIXr,c)
The user defined mapping that this ticket is about, is not applicable in CHARACTER mode where there is unicode information, because in that mode, the user expressed the wish for a mostly automatic mapping. Only for the other modes, or cases where no unicode information is present (including host input like host joypad buttons).
Current hardcoded mappings (e.g. F7 → SELECT, F8 → STOP) must become configurable as well, with the same mechanism.
The configuration we will offer is about a host input event (of a property that has only 2 states like pressed and non-pressed) that can be mapped to an MSX key name. That is, a map of MSX key name with 0 or more host events that trigger the pressing/releasing of that key.
By default, the mapping will remain as is (so, for example, the MSX key name STOP will be triggered if host key name F8 is triggered).
- The default mapping for CHARACTER mode is the same as for KEY mode.
- The default mapping for POSITIONAL mode as the same as for KEY mode.
- A host keyboard input that maps to an MSX key (real or MATRIXr,c) can be a host scancode or a host keycode.
- TODO: can we use that MATRIXr,c also as destination for mappings in KEY mode? That would make the host side KEY based but the MSX-side POSITIONAL based. It might be confusing, but why forbid it, if we already define the MATRIXr,c as 'fake MSX key names' anyway?

Typical use cases that must be supported:

The user desires to map host joypad input to MSX function keys (very useful for playing Konami games with host joypad).
The user desires to map host joypad input the keys used in Spectrum port (typical letter keys like Q, A, O and P).
The user desires to map host keyboard input (cursor keys) the keys used in Spectrum port (typical letter keys like Q, A, O and P).
The user wants to type with his US English QWERTY host keyboard on a French (AZERTY) MSX keyboard, without having to worry about the different keyboard layout. (This is the main feature of the CHARACTER mode.)
The mapping must also be possible to be offered on a GUI (OSD menu, or different) which requires:
- A way to list current bindings: list all current MSX keys and which input maps to them
- A way to change a current binding
- A way to tell whether the host input is mappable to the MSX key (it must be an input with a discrete pressed/released state). (Could this be just a hardcoded list?)

Actions:

Both host scan codes and key codes are transformed into a single enum type in openMSX with key codes. This causes some information to get lost in translation, see also for instance the analysis in #1465 . The openMSX enum was made to be independent of SDL. If we want to maintain that, we should split it into two enums (one for scan codes and one for key codes). Alternatively, we directly use the SDL definitions and lose the independence. In the worst case, we could also just copy the SDL definitions. The scancode definitions are based on a 'standard' document (https://www.usb.org/sites/default/files/documents/hut1_12v2.pdf), the keycode definitions seem to be a choice of SDL itself.
- Decision is needed on how to proceed here: use SDL definitions (directly or copied) or make our own. → for now, let's just use the SDL definitions only and ditch our own enum.
To clean up the keyboard code in combination with replays/savestates, we need to stop storing commands in replays and only store the effect of the commands. In this context (keyboard stuff): only keyboard matrix changes should be recorded, not the way they were created (like: type command, keymatrixdown/up commands, etc.) See also #902 for a similar issue.
- I'm not sure how much this is in the way when we refactor the Keyboard class.
The keyTab table(s) in Keyboard.cc is now used to map an openMSX key name (internal enum) to a keyboard matrix position in a hardcoded way, when there is no unicode field. It seems obvious that this table must be changed:
- it must not map to keyboard matrix positions, but to MSX keys (let's not store matrix positions for now, as it makes a lot more sense to abstract matrix positions away here, I think).
- it must become configurable, with multiple possible host events mapping to a single MSX key (so, change table into map).
- while we're at it: there are several hardcoded keyboard matrix positions in Keyboard.cc, which isn't necessary anymore: the data is available in the new mapping files in the branch.
Design needed how to save the mapping. It could be done in settings.xml, but we must be careful that it must be possible to remove a default mapping as well. (With other settings, removing a setting from the file means you set it to default.) Perhaps it can be done like how we handle the bind settings.

Other open items/details/exceptions:

how do we handle bindings to MSX keys that aren't existing in the current MSX model? We must ignore when they're triggered, I guess. Is it OK if we don't list these when showing the current bindings?
- This is probably quite exceptional, most mappings will be done to common MSX keys like the function keys.

openMSX / openMSX

Easier way to bind host controller to any MSX key #1398