microsoft / vscode-hexeditor

VS Code Hex Editor
https://marketplace.visualstudio.com/items?itemName=ms-vscode.hexeditor
MIT License
533 stars 89 forks source link

Provide Unicode character Classification and Character name in details. #409

Open bcowgill opened 1 year ago

bcowgill commented 1 year ago

When showing the details of a unicode character it would be useful to show the character classification and official Unicode character name.

Here's an example for a few characters at U+2325

$ utf8ls.pl U+0073 U+2325
s   U+73    [LowercaseLetter]   LATIN SMALL LETTER S
⌥   U+2325  [OtherSymbol]   OPTION KEY
⌦   U+2326  [OtherSymbol]   ERASE TO THE RIGHT
⌧   U+2327  [OtherSymbol]   X IN A RECTANGLE BOX
⌨   U+2328  [OtherSymbol]   KEYBOARD
〈   U+2329  [OpenPunctuation]   LEFT-POINTING ANGLE BRACKET
〉   U+232A  [ClosePunctuation]  RIGHT-POINTING ANGLE BRACKET

You could convert this output into a JSON lookup for each unicode code point to display along with the character.

You can generate a full table in json format with my perl script here: https://github.com/bcowgill/bsac-linux-cfg/blob/master/bin/utf8ls.pl

> utf8ls.pl --all U+0000 | perl -pne 'BEGIN {print "{\n"} END {print "}\n"} chomp; m{U\+(\w+)\s+(\[\w+\])\s+(.+)}; $u = substr("000$1", -4); $_ = $2 ? qq{"\\u$u": { class: "$2", name: "$3"},\n}:""' > utf8.json

{
"\u0000": { class: "[Control]", name: "NULL"},
"\u0001": { class: "[Control]", name: "START OF HEADING"},
"\u0002": { class: "[Control]", name: "START OF TEXT"},

or I can provide it if you cannot run perl