Closed ale5000-git closed 1 year ago
I have mixed octal and hex, now I'm a step ahead:
iconv -o output.csv -f UTF-16LE -t UTF-8 supported_devices_orig.csv
repl="$(printf '%b' '\xe2' '\x80' '\x9d')"
sed -i "s/${repl:?}/\"/g" ./output.csv
iconv -o output.csv -c -f UTF-8 -t ASCII output.csv
But the last line give an empty file.
The last command has the same file for input and output. That works on Linux but doesn't seem to work with busybox-w32 iconv
.
Is there any possibility for iconv to handle internally a temp file or if not possible detect the case and return failure without changing the file?
Sure, I'm looking into it now.
Thanks a lot :)
OK, the latest prerelease binaries create a temporary file for output and rename it on completion. Similar to what sed
does.
It works fine thanks.
I have just a question for my code above, is there any way to use \342\200\235 directly in sed without the printf or not?
If possible could you also implement iconv --version
?
So I can easily distinguish it from the annoying GNU libiconv
that doesn't support -o
The output of others:
iconv (GNU libiconv 1.16)
Copyright (C) 2000-2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Bruno Haible.
iconv (Ubuntu GLIBC 2.35-0ubuntu3.1) 2.35
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Ulrich Drepper.
sed
in BusyBox has this:
/* Lie to autoconf when it starts asking stupid questions. */
if (argv[1] && strcmp(argv[1], "--version") == 0) {
puts("This is not GNU sed version 4.0");
return 0;
}
Which, of course, isn't a lie.
is there any way to use \342\200\235 directly in sed without the printf or not?
Not that I can see. Handling of backslash escapes in upstream BusyBox sed
is quite limited.
If possible could you also implement iconv --version?
If we want actual version information it would probably be best to implement --version
for all applets. Like how they (mostly) all support --help
.
The sed
case, though, illustrates another possible requirement: to pretend to be compatible with common applications to fool things like autoconf
.
Thanks, for sed it isn't a problem, I will just leave the variable.
In my case I write my code personally so I can make what I want but in some case I have to distinguish between various versions when the support of parameters is different.
It is a thing not really connected to busybox but do you know why GNU libiconv has return value 1 even though it does the conversion with this?
iconv -c -f 'UTF-8' -t 'WINDOWS-1252' ./input.txt 1> ./output.txt || echo "Fail ${?}"
If the file is UTF-8 with BOM it return 1 instead if it is UTF-8 withOUT BOM it return 0
Sorry, I don't know anything about GNU libiconv. I wasn't even aware it existed until you mentioned it. I only knew of the GNU libc iconv.
I have it because it is included in both Git and Ruby for Windows.
I have noticed that it works:
busybox iconv -c -f 'UTF-8' -t 'LATIN1//translit' ./input.txt
1> ./output.txt
but this doesn't give an error:
busybox iconv -c -f 'UTF-8' -t 'LATIN1//wrongtext' ./input.txt
1> ./output.txt
GNU libc iconv
silently ignores LATIN1//wrongtext
too.
GNU libiconv say:
iconv: conversion to LATIN1//wrogntext unsupported
iconv: try 'iconv -l' to get the list of supported encodings
I've looked at the code. The two GNU implementations handle things quite differently:
//TRANSLIT
or //IGNORE
. Anything else is left appended to the name of the encoding, resulting in the error about the conversion being unsupported.//
, either ,
or a single /
. There's also an explicit comment:
Unknown suffixes are silently discarded and ignored.
The busybox-w32 implementation is more like GNU glibc iconv
. I don't see any pressing need to change it.
The major problem is that they aren't listed anywhere, not in iconv --help
and not even in iconv -l
.
It is possible maybe to list the supported ones in iconv --help
?
Otherwise one (that doesn't know them and just trust the --help
) never know.
I need to handle it with just busybox-w32.
My idea was this:
The sed line doesn't work, it should replace the UTF-8
”
with a normal"
. Any suggestion?