Copying a tagged or untagged file greater than 8 bytes results in content change and file tagging

chrishodgins commented 7 months ago

While using the zopen perl, an untagged file with EBCDIC contents was copied. When the file was <= 8 bytes everything worked as expected. When the file was > 8 bytes, the file was copied however the contents had been converted to iso8859-1 and the file had been tagged as IBM-1047.

Both setting the export __UNTAGGED_READ_MODE=ASCII environment variable and tagging the files first also didn't resolve the problem. Files encoded as ISO8859-1 are copied without any additional tagging or conversion.

$ perl --version

This is perl 5, version 39, subversion 8 (v5.39.8*) built for os390
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2024, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at https://www.perl.org/, the Perl Home Page.

### Simple perl script to copy each file
$ cat test.pl
use File::Copy;
copy("./ebcdic.txt", "./ebcdic_copy.txt");
copy("./ebcdic_untagged.txt", "./ebcdic_untagged_copy.txt");

### Files ready to be copied with the correct tags
$ ls -T *.txt
t IBM-1047    T=on  ebcdic.txt
- untagged    T=off ebcdic_untagged.txt
$ cat ebcdic.txt
Hello World!
$ cat ebcdic_untagged.txt
Hello World!

### Run the perl script to copy the files
$ perl test.pl

### Our copy of the untagged file has been given a tag
$ ls -T *.txt
t IBM-1047    T=on  ebcdic.txt
t IBM-1047    T=on  ebcdic_copy.txt
- untagged    T=off ebcdic_untagged.txt
t IBM-1047    T=on  ebcdic_untagged_copy.txt

### The original files are unharmed and remain in EBCDIC
$ od -xc -Ax ebcdic.txt
0000000000      C885    9393    9640    E696    9993    845A    1500
               H   e   l   l   o       W   o   r   l   d   !  \n
000000000D
$ od -xc -Ax ebcdic_untagged.txt
0000000000      C885    9393    9640    E696    9993    845A    1500
               H   e   l   l   o       W   o   r   l   d   !  \n
000000000D

### The copied files are now both ISO8859-1 encoded and both tagged as IBM-1047
$ od -xc -Ax ebcdic_copy.txt
0000000000      4865    6C6C    6F20    576F    726C    6421    0A00
             110 145   %   %   ? 040 127   ? 162   % 144 041 012
000000000D
$ od -xc -Ax ebcdic_untagged_copy.txt
0000000000      4865    6C6C    6F20    576F    726C    6421    0A00
             110 145   %   %   ? 040 127   ? 162   % 144 041 012
000000000D

### Now repeat with __UNTAGGED_READ_MODE=ASCII
$ rm *copy*
$ export __UNTAGGED_READ_MODE=ASCII
$ perl test.pl
$ ls -T *.txt
t IBM-1047    T=on  ebcdic.txt
t IBM-1047    T=on  ebcdic_copy.txt
- untagged    T=off ebcdic_untagged.txt
t IBM-1047    T=on  ebcdic_untagged_copy.txt
$ od -xc ebcdic_copy.txt
0000000000      4865    6C6C    6F20    576F    726C    6421    0A00
             110 145   %   %   ? 040 127   ? 162   % 144 041 012
0000000015
$ od -xc ebcdic_untagged_copy.txt
0000000000      4865    6C6C    6F20    576F    726C    6421    0A00
             110 145   %   %   ? 040 127   ? 162   % 144 041 012
0000000015

covener commented 6 months ago

@IgorTodorovskiIBM meant to be closed? mentioned in PR but not fully linked

IgorTodorovskiIBM commented 6 months ago

Yes, if you can help verify that would be great!

zopencommunity / perlport

Copying a tagged or untagged file greater than 8 bytes results in content change and file tagging #85