rrthomas / recode

Charset converter tool and library
GNU General Public License v3.0
130 stars 12 forks source link

Support for the ZOS_UNIX surface for EBCDIC encodings #49

Open bhaible opened 1 year ago

bhaible commented 1 year ago

For the end-of-line handling, the only documented surfaces so far are CR and CR-LF. (Doc node "Representation for end of lines")

The Unicode Standard https://www.unicode.org/versions/Unicode15.0.0/ch05.pdf explains (section 5.8 "Newline Guidelines") that for EBCDIC encodings there are two end-of-line mapping conventions in use (see table 5-1):

This is the summary; more details in the thread that starts at https://lists.gnu.org/archive/html/bug-gnu-libiconv/2023-04/msg00002.html .

GNU libiconv now makes use of the concept and syntax of a recode "surface":

I would suggest that recode supports the same surface ZOS_UNIX with the same name and the same semantics (swap 0x15 and 0x25).

To understand how this works in practice, with GNU libiconv, see this unit test: https://git.savannah.gnu.org/gitweb/?p=libiconv.git;a=blob;f=tests/check-ebcdic;h=62dfd61437d008af1f3f47ae69baeba692e01792;hb=19b6af5e5efe306bc1b2da87ba054b7391360ca2

rrthomas commented 1 year ago

Thanks very much for this, Bruno, and especially for the detailed explanation. I agree that this is precisely the sort of thing Recode should support, and I'll look into it when I can.