samhocevar / wincompose

🔣 Compose Key for Windows
http://wincompose.info/
Other
2.61k stars 83 forks source link

Cannot compose non-printable control characters (below U+0020) #423

Open LexouDuck opened 3 years ago

LexouDuck commented 3 years ago

First things first, let me thank you for this great tool you built ! It has been of great use to me.

As a programmer, there are times when I would like to be able to write special non-printable C0 control characters (ie: the characters in the range [U+0000 , U+0020[). The "allow unicode input" option seems like the perfect fit for this niche use-case.

In fact, this unicode option does work for C1 control characters (ie: in the range [U+0080, U+00A0[), which are also non-printable characters. I am able to type the following sequence: Compose u 8 6 enter, and it does indeed output an SSA/"start of selected area" character (https://unicode-table.com/en/#0086).

But, the same does not work for the C0 characters: if I type out Compose u 1 b enter, i would expect to get an "escape" character (https://unicode-table.com/en/#001B), but nothing is output.

I thought "Maybe this was purposefully done, perhaps to avoid OS-level bugs - I should try and make custom sequences for this myself, then" Unfortunately, these custom compose sequences do not work either (apart from the C0 printable whitespace characters, ie: space, newline, etc - all the characters in the range [U+0009, U+000D]). Here is the .XCompose code I wrote, for reference:

<Multi_key> <u> <0> <0> <Return> : "\0x0000"    # NUL:  null
<Multi_key> <u> <0> <1> <Return> : "\0x0001"    # SOH:  start of heading
<Multi_key> <u> <0> <2> <Return> : "\0x0002"    # STX:  start of text
<Multi_key> <u> <0> <3> <Return> : "\0x0003"    # ETX:  end of text
<Multi_key> <u> <0> <4> <Return> : "\0x0004"    # EOH:  end of transmission
<Multi_key> <u> <0> <5> <Return> : "\0x0005"    # ENQ:  enquiry
<Multi_key> <u> <0> <6> <Return> : "\0x0006"    # ACK:  acknowledge
<Multi_key> <u> <0> <7> <Return> : "\0x0007"    # BEL:  bell/alert
<Multi_key> <u> <0> <8> <Return> : "\0x0008"    # BS:   backspace
<Multi_key> <u> <0> <9> <Return> : "\t"     # TAB:  tab
<Multi_key> <u> <0> <a> <Return> : "\n"     # LF:   line feed
<Multi_key> <u> <0> <b> <Return> : "\0x000B"    # VT:   vertical tab
<Multi_key> <u> <0> <c> <Return> : "\0x000C"    # FF:   form feed
<Multi_key> <u> <0> <d> <Return> : "\r"     # CR:   carriage return
<Multi_key> <u> <0> <e> <Return> : "\0x000E"    # SO:   shift out
<Multi_key> <u> <0> <f> <Return> : "\0x000F"    # SI:   shift in
<Multi_key> <u> <1> <0> <Return> : "\0x0010"    # DLE:  data link escape
<Multi_key> <u> <1> <1> <Return> : "\0x0011"    # DC1:  device control 1
<Multi_key> <u> <1> <2> <Return> : "\0x0012"    # DC2:  device control 2
<Multi_key> <u> <1> <3> <Return> : "\0x0013"    # DC3:  device control 3
<Multi_key> <u> <1> <4> <Return> : "\0x0014"    # DC4:  device control 4
<Multi_key> <u> <1> <5> <Return> : "\0x0015"    # NAK:  negative acknowledge
<Multi_key> <u> <1> <6> <Return> : "\0x0016"    # SYN:  synchronous idle
<Multi_key> <u> <1> <7> <Return> : "\0x0017"    # ETB:  end of transmission block
<Multi_key> <u> <1> <8> <Return> : "\0x0018"    # CAN:  cancel
<Multi_key> <u> <1> <9> <Return> : "\0x0019"    # EM:   end of medium
<Multi_key> <u> <1> <a> <Return> : "\0x001A"    # SUB:  substitute
<Multi_key> <u> <1> <b> <Return> : "\0x001B"    # ESC:  escape
<Multi_key> <u> <1> <c> <Return> : "\0x001C"    # FS:   info separator 4 (file)
<Multi_key> <u> <1> <d> <Return> : "\0x001D"    # GS:   info separator 3 (group)
<Multi_key> <u> <1> <e> <Return> : "\0x001E"    # RS:   info separator 2 (record)
<Multi_key> <u> <1> <f> <Return> : "\0x001F"    # US:   info separator 1 (unit)

Thanks in advance for your help.

nuanjanP commented 6 months ago

I was having this same problem in most applications, but in some, the C0 control characters can be composed with the "allow unicode input" method:

NOTE To actually see the C0 characters, you need a font that supports them, for example, Unifont. Also, you have to set these programs to "show invisible characters".

  • Notepad++ from the menu bar: View > Show Symbol > Show All Characters
  • Babelpad from the top toolbar: make sure the (Glyph mode) option is selected (not the ü or Text Mode)

Also, copying from these programs then pasting into other editors (such as VS Code or MS Notepad) works, with the caveat of NULL U+0000 always turning into the ordinary SPACE U+0020.

So in practice for example, though I use normally VS Code, when I want to compose C0 control characters, I'd fire up Notepad++ (with invisible characters shown for ease of working with them), compose them there, then copy them over to VS Code.