rmyorston / busybox-w32

WIN32 native port of BusyBox.
https://frippery.org/busybox
Other
696 stars 126 forks source link

cut -s doesn't remove blank lines #459

Open ale5000-git opened 2 months ago

ale5000-git commented 2 months ago

cut -s doesn't remove blank lines, but I think it should.

Example: wmic.exe cpu get DataWidth /VALUE | tr -d '\r' | cut -d '=' -f '2-' -s

Edit: With this: wmic.exe cpu get DataWidth /VALUE | tr -d '\r' | cut -d '=' -f '2-' -s it does NOT remove blank lines. But with this: wmic.exe cpu get DataWidth /VALUE | cut -d '=' -f '2-' -s | tr -d '\r' it does remove them.

rmyorston commented 2 months ago

The problem has been reported upstream, with a patch. It hasn't been accepted yet.

Since it isn't very intrusive I've taken a version into busybox-w32. See the latest prerelease (PRE-5470 or later).

avih commented 2 months ago

POSIX says:

(cut -s) Suppress lines with no delimiter characters, when used with the -f option

But looking at the code, I'm not sure it's limited to usage together with -f (maybe it is, but it's not obvious to me, which is why I mentioned it).

rmyorston commented 2 months ago

I'm not sure it's limited to usage together with -f

It isn't. It's also used with -F, but that isn't in POSIX.

CUT_OPT_SUPPRESS_FLGS is only actually used in the

       } else {        /* cut by fields */

block of cut_file() which handles -f and -F. These are mutually exclusive with -c and -b which are handled in the if at the top of the loop over every line.

avih commented 2 months ago

Thanks.

ale5000-git commented 2 months ago

@rmyorston Thanks, now it behave correctly with the -s option.

Now I have a doubt about the output without the -s option. Apparently wmic.exe have a weird behaviour where every line is terminated by \r\r\n; it's output is this: printf '\r\r\n\r\r\nDataWidth=64\r\r\n\r\r\n\r\r\n\r\r\n'.

Trying with busybox I get this:

$ wmic.exe cpu get DataWidth /VALUE | od -A n -t x1
 0d 0d 0a 0d 0d 0a 44 61 74 61 57 69 64 74 68 3d
 36 34 0d 0d 0a 0d 0d 0a 0d 0d 0a 0d 0d 0a
$ wmic.exe cpu get DataWidth /VALUE | tr -d '\r' | od -A n -t x1
 0a 0a 44 61 74 61 57 69 64 74 68 3d 36 34 0a 0a
 0a 0a
$ wmic.exe cpu get DataWidth /VALUE | cut -d '=' -f '2-' | od -A n -t x1
 0d 0a 0d 0a 36 34 0d 0a 0d 0a 0d 0a 0d 0a
$ wmic.exe cpu get DataWidth /VALUE | tr -d '\r' | cut -d '=' -f '2-' | od -A n -t x1
 0a 0a 36 34 0a 0a 0a 0a

Instead trying with Bash for Windows I get this:

$ wmic.exe cpu get DataWidth //VALUE | od -A n -t x1
 0d 0d 0a 0d 0d 0a 44 61 74 61 57 69 64 74 68 3d
 36 34 0d 0d 0a 0d 0d 0a 0d 0d 0a 0d 0d 0a
$ wmic.exe cpu get DataWidth //VALUE | tr -d '\r' | od -A n -t x1
 0a 0a 44 61 74 61 57 69 64 74 68 3d 36 34 0a 0a
 0a 0a
$ wmic.exe cpu get DataWidth //VALUE | cut -d '=' -f '2-' | od -A n -t x1
 0d 0d 0a 0d 0d 0a 36 34 0d 0d 0a 0d 0d 0a 0d 0d
 0a 0d 0d 0a
$ wmic.exe cpu get DataWidth //VALUE | tr -d '\r' | cut -d '=' -f '2-' | od -A n -t x1
 0a 0a 36 34 0a 0a 0a 0a

You can notice the difference in the third case. What is the correct one?

rmyorston commented 2 months ago

every line is terminated by \r\r\n

OK, that's a bit weird.

What is the correct one?

That would be a matter of opinion.

In busybox-w32 cut reads lines by calling xmalloc_fgetline() which removes the trailing CRLF. This leaves a single CR at the end of the line. When a line is output it's followed by a LF. So you end up with lines terminated with CRLF.

I guess the cut from 'Bash for Windows' is only removing the trailing LF, thus leaving CRCR at the end of the line. Then it adds LF on output.

You pays your money, and you takes your choice.

ale5000-git commented 2 months ago

@rmyorston If you don't notice any issue then its fine.

While I was testing the previous issues I have noticed another strange issue, first I put this text in notepad:

wmic.exe cpu get DataWidth /VALUE | cut -d '=' -f '2-' -s | od -A n -t x1
wmic.exe cpu get DataWidth /VALUE | cut -d '=' -f '2-' -s | od -A n -t x1

Then I copy it and then paste it in busybox ash: busybox-paste

How you can see it loss a char during paste.

rmyorston commented 2 months ago

I've no idea what's going on with your paste. The only thing I notice is that it seems to work fine if the very first command is anything other than wmic.

avih commented 2 months ago

How you can see it loss a char during paste.

Does this happen every time you paste these lines? or only sometimes. I.e. is it reproducible? If yes, how?

Which OS is that?

What's the console CP? (run chcp)

Can you try also in CP 437? (chcp 437)

Which console/terminal is used when this happens? (e.g. is it a "plain" cmd.exe conhost window?)

Is this the unicode or non-unicode build?

Can you try the same paste also with the other type of yes/no unicode build?

The unicode build has some workaround for input issues when pasting, I wonder if that's related and/or can fix something. But as far as I could tell back then the issues (which are different depending on the OS and terminal in use) were only when the Consone-CP was UTF-8 (65001) AND pasting non-ASCII text, but in this case it looks all-ASCII to me.

But it's also possible that are remaining paste issues which I didn't notice and didn't fix, or that the workaround at the unicode build adds some issues when pasting which don't exist at the non-unicode build.

(I use the unicode build and paste a lot into the shell, both in conhost windows and windows terminal, and have not noticed any paste issues since the workaround was added - shortly after unicode support was added)

avih commented 2 months ago

While I was testing the previous issues I have noticed another strange issue, first I put this text in notepad:

wmic.exe cpu get DataWidth /VALUE | cut -d '=' -f '2-' -s | od -A n -t x1
wmic.exe cpu get DataWidth /VALUE | cut -d '=' -f '2-' -s | od -A n -t x1

Hmm.. I can reproduce the issue in win10 in both Unicode and non-Unicode bb build, in both conhost and windows terminal.

But rejoice, it also happens in conhost without busybox, e.g. WIN+R and type cmd.exe (and enter) to open a new cmd.exe window, without busybox shell, and paste this:

wmic.exe cpu get DataWidth /VALUE
echo

and notice how the e of echo dissappears. So not a busybox issue.

FWIW, it also happens in OpenConsole.exe of a new Windows terminal (i.e. the latest version of conhost which ships with the windows terminal).

Maybe wmic reads a console input event so that for the next program it simply dissappears from the KB input queue?

ale5000-git commented 2 months ago

It is reproducible all the time. Windows 10 64-bit.

It happens in both 64-bit unicode and 32-bit NOT unicode build of BusyBox for Windows. But the problem does NOT happen in Bash for Windows. Everything is executed from cmd.exe.

BusyBox 32-bit for Windows: busybox32-win

Bash 64-bit for Windows: bash64-win

rmyorston commented 2 months ago

But rejoice, it also happens in conhost without busybox

@avih Thanks for confirming that. That's one combination I didn't try.

wmic must be a very odd program.

@ale5000-git What do you mean by 'bash for windows'?

ale5000-git commented 2 months ago

@rmyorston This: C:/Program Files/Git/usr/bin/bash.exe GNU bash, version 5.2.26(1)-release (x86_64-pc-msys)

janko-jj commented 2 months ago

That is then a "git for windows" package, in which that bash is just the shell. That cut is the separate binary there, depending on msys and similar dlls.

ale5000-git commented 2 months ago

A bash installation include all other things, even if they are separate exe they are still installed together.

janko-jj commented 2 months ago

The thing you've installed is called "Git for Windows" ( https://git-scm.com/downloads/win ) as we can see from your path: C:/Program Files/Git/usr/bin, and "bash" there (and generally) doesn't itself provide "cut". What that "Git for Windows" integrates is a msys2 build ( https://www.msys2.org/ ) which includes cut.

ale5000-git commented 2 months ago

I know that normal bash for other OSes doesn't provide it, but all installers for Windows that include bash also include other tools or you can provide an example of one not providing it?

janko-jj commented 2 months ago

https://win-bash.sourceforge.net/

ale5000-git commented 2 months ago

The one you linked isn't an installer, just a compressed zip but still if you open shell.w32-ix86.zip there is also cut.exe as well as other things.

janko-jj commented 2 months ago

It doesn't matter, you can't claim that "cut" is "in bash" especially not in "Bash for Windows" when bash simply never contained cut and still never contains, and the package you installed is explicitly called "Git for Windows". Note: git as the program also doesn't contain cut, but the package you provably installed, which is actually called "Git for Windows" contains it. You were confused, it happens, but any further discussion is pointless, so won't read and respond to your answers here anymore.

ale5000-git commented 2 months ago

You are the one that come here to reply, it doesn't matter for me what is the package origin. I have only use the name "Bash for Windows" (meaning it as Bash compiled for Windows) to specify that it is tested with Bash and the tools bundled with it. Why complicate things?

That discussion was already ended, the only thing remained was a paste issue in BusyBox.

avih commented 2 months ago

the only thing remained was a paste issue in BusyBox.

Are you sure? see the comments that the wmic paste issue also happens without busybox, so that's not a busybox issue.

So which paste issue remains?

ale5000-git commented 2 months ago

So which paste issue remains?

The issue doesn't affect bash, so this mean that there is a way to avoid being influenced by external processes. I don't know the details but this prove it isn't impossible.

avih commented 2 months ago

So which paste issue remains?

The issue doesn't affect bash

I wasn't asking about bash.

the only thing remained was a paste issue in BusyBox.

So what paste issue remains in busybox?

Can you give an example how to reproduce the problem?

ale5000-git commented 2 months ago

Are you sure? see the comments that the wmic paste issue also happens without busybox

The issue that happens also without busybox could be avoided in busybox, just need to find how. Just because it isn't a busybox issue it doesn't mean that it can't be fixed in busybox, there are already a lot of workarounds for Windows bugs.

avih commented 2 months ago

Are you sure? see the comments that the wmic paste issue also happens without busybox

The issue that happens also without busybox could be avoided in busybox, just need to find how. Just because it isn't a busybox issue it doesn't mean that it can't be fixed in busybox, there are already a lot of workarounds for Windows bugs.

I don't think it's a windows issue. I think it's a wmic bug, and I don't think busybox-w32 should try to work around it.

If someone wants to find what the issue is with wmic and work around it, IMO they should think twice before starting.

rmyorston commented 2 months ago

I agree. I don't think the issue with wmic is important enough to warrant much effort in finding a workaround.

ale5000-git commented 2 months ago

The problem isn't with wmic itself, but it is more that wmic is a proof that any app can mess with busybox paste and possibly making busybox into pasting malicious code (by altering user paste). So this is a security issue.

avih commented 2 months ago

Use linux.

ale5000-git commented 2 months ago

If I was using Linux I probably wouldn't be in this repo. Beside there isn't any need to talk about unrelated things, I'm just giving constructive comments.

rmyorston commented 2 months ago

So this is a security issue.

Perhaps so. But if it is, it isn't specific to busybox-w32.

Try pasting this into a busybox-w32 shell:

wmic cpu get datawidth
# echo just a comment, will not be executed

Or this into cmd.exe:

wmic cpu get datawidth
: echo just a comment, will not be executed

Or this into bash on Linux:

dd if=/dev/tty bs=1 count=1 >/dev/null 2>&1
# echo just a comment, will not be executed

In each case the echo will be executed.

rmyorston commented 2 months ago

In the last case you need to have bracketed paste turned off. Or use a shell like dash which doesn't support it.

ale5000-git commented 2 months ago

Isn't possible to have bracketed paste also in busybox?

rmyorston commented 2 months ago

Isn't possible to have bracketed paste also in busybox?

Possible, I suppose, but not a priority.

Support for bracketed paste in applets isn't specific to Windows so I'd want it to be upstream.