zopencommunity / coreutilsport

A collection of basic Unix utilities
Apache License 2.0
0 stars 3 forks source link

cp sometimes converts the codepage of untagged #46

Open covener opened 1 year ago

covener commented 1 year ago

This is a bizarre one, but once I had zopen coreutils in my path, my build failed when parsing a .x file that had been copied around.

But my test invocations of cp never had any issue.

Eventually stumbled on this testcase that seems to corrupt the file contents, and it seems to be influenced by the file contents. Maybe some kind of sniffing that isn't shared by /bin/file since it reports text for both?

#!/bin/sh

which cp

export _BPXK_AUTOCVT=ON
export _CEE_RUNOPTS="FILETAG(AUTOCVT,AUTOTAG) POSIX(ON)"

for contents in "IMPORT D" "IMPORT DA"; do
    echo "Testing cp with content of '$contents'"
    rm -f ebcdic-file
    touch ebcdic-file
    printf "$contents" >> ebcdic-file

    echo "Test with a untagged src"
    chtag -r ebcdic-file
    ls -T ebcdic-file
    file ebcdic-file
    /bin/od -t x1 ebcdic-file

    cp ebcdic-file hopefully-another-ebcdic-file

    ls -T hopefully-another-ebcdic-file
    /bin/od -t x1 hopefully-another-ebcdic-file

    echo "Test with a tagged src"
    chtag -tc ibm1047 ebcdic-file
    rm -f hopefully-another-ebcdic-file
    cp ebcdic-file hopefully-another-ebcdic-file
    ls -T hopefully-another-ebcdic-file
    /bin/od -t x1 hopefully-another-ebcdic-file
done

my odd result toward the end

 $ ./test-cp.sh
/u/WASTST1/zopen/prod/coreutils/bin/cp
Testing cp with content of 'IMPORT D'
Test with a untagged src
- untagged    T=off ebcdic-file
ebcdic-file:    text
0000000000    C9  D4  D7  D6  D9  E3  40  C4
0000000010
- untagged    T=off hopefully-another-ebcdic-file
0000000000    C9  D4  D7  D6  D9  E3  40  C4
0000000010
Test with a tagged src
t IBM-1047    T=on  hopefully-another-ebcdic-file
0000000000    C9  D4  D7  D6  D9  E3  40  C4
0000000010
Testing cp with content of 'IMPORT DA'
Test with a untagged src
- untagged    T=off ebcdic-file
ebcdic-file:    text
0000000000    C9  D4  D7  D6  D9  E3  40  C4  C1
0000000011
- untagged    T=off hopefully-another-ebcdic-file
0000000000    49  4D  50  4F  52  54  20  44  41
0000000011
Test with a tagged src
t IBM-1047    T=on  hopefully-another-ebcdic-file
0000000000    C9  D4  D7  D6  D9  E3  40  C4  C1
0000000011
gngrossi commented 1 year ago

Regarding the “cp” command… I copied an untagged ebcdic text file from a user’s directory into /tmp using (cp and cp -p) and noticed the file tagging was left unchanged. It looked like the copied file was converted to ascii so I tagged as ascii.

Is this correct and the expected behavior? (I was expecting the copied file to be tagged ascii because it was converted).

thanks

IgorTodorovskiIBM commented 1 year ago

I see, we have a heuristic in zoslib that detects the encoding of an untagged file. We should probably just disable auto-conversion in cp.

MikeFultonDev commented 1 year ago

@IgorTodorovskiIBM is this problem/issue covered in a more general issue for zoslib? If so, can we cross-reference that issue and close this one?

HarithaIBM commented 8 months ago

@IgorTodorovskiIBM Do you think the cp issue we found in libraries for "mixed" tagged files with IBM-1047 encoding is related to this?