rizinorg / ideas

Features that would be nice to have but they are not in the roadmap
3 stars 0 forks source link

EBCDIC character support #15

Closed XVilka closed 2 years ago

XVilka commented 3 years ago

Extended Binary Coded Decimal Interchange Code (EBCDIC; /ˈɛbsɪdɪk/) is an eight-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding six-bit binary-coded decimal code used with most of IBM's computer peripherals of the late 1950s and early 1960s. It is supported by various non-IBM platforms, such as Fujitsu-Siemens' BS2000/OSD, OS-IV, MSP, and MSP-EX, the SDS Sigma series, Unisys VS/9, Burroughs MCP and ICL VME.

Rizin should have

See:

The pushed code should follow LGPL license (or more permissive ones). There is some existing code in librz/magic/* but it's incomplete and not integrated in the Rizin itself.

The desired location of working with this encoding is librz/util/*

See also https://github.com/rizinorg/rizin/issues/1052

XVilka commented 3 years ago

The most common character sets are:

For mapping the particular EBCDIC character set to the UTF-8 you can consult pages like these:

These are lists of all:

See also the http://www.longpelaexpertise.com.au/ezine/LostinTranslation1.php for some more common EBCDIC character sets. Regarding localized charsets I recommend to cover all US, UK, ES, FR, JP, RU character sets where these machines were the most popular during these times.

See also tests/ in iconv repository: https://git.savannah.gnu.org/gitweb/?p=libiconv.git;a=tree;f=tests;h=931829519771ea1fdc0f102cf9e5aabfd06e170b;hb=HEAD