malxau / yori

Yori is a CMD replacement shell that supports backquotes, job control, and improves tab completion, file matching, aliases, command history, and more.
http://www.malsmith.net/yori/
MIT License
1.24k stars 31 forks source link

YECHO and double quotes escape with CMD #83

Closed aleaksunder closed 3 years ago

aleaksunder commented 3 years ago

In Yori environment all works perfectly:

YORI> ECHO -n -- -argument1 "value 1" --> -argument1 value 1 this is not what we want but quite expected. Here we need to escape our " symbols

YORI> ECHO -n -- -argument1 ^"value 1^" -- > -argument1 "value 1" and all is nice and flawless...

But something weird is going on when I try to use yecho.exe with CMD: CMD> yecho.exe -n -- -argument1 "value 1" --> -argument1 value 1 Nothing special... all as expected

CMD> yecho.exe -n -- -argument1 ^"value 1^" --> -argument1 value 1 Ok... the thing here is CMD by itself handled ^" symbols and yecho.exe recieved -argument1 "value 1" as in previous attempt and nothing suprisingly new here... so in theory we just need to add additional escape for ^ symbol:

CMD> yecho.exe -n -- -argument1 ^^^"value 1^^^" --> -argument1 ^"value 1^" And that is completely unxpected behavior... let's even try what is not supposed to be:

CMD> yecho.exe -n -- -argument1 ^^"value 1^^" --> -argument1 ^"value 1^^" And I have tried a lot of things to make yecho.exe work with CMD and double quotes, but nothing was successful... the only thing what's worked is such construction:

CMD> yori.exe -nouser -c ECHO -n -- -argument1 ^^^"value 1^^^" --> -argument1 "value 1" And this is quite expected and absolutely normal...

I understand that this is quite a headache with all that double quote "magic" in Windows and CMD specially and do not expect a quick reaction... Just want to ask: Am I missing something here? Maybe you can give an advice how to properly escape the double quotes so yecho.exe will work with CMD as it works in YORI?

aleaksunder commented 3 years ago

I guess something similar is with REPL... nothing helps to escape " symbol even with YORI environment: nor

YORI> echo "string ^"with^" quotes" | repl ^"with^" without 

neither

YORI> echo "string ^"with^" quotes" | repl ^^^"with^^^" without 

nothing seems to work... tried to do this with source of repl.c: before: https://github.com/malxau/yori/blob/1ce1b7d1d0f5a7809a8f0d88ce6bbe4b98539930/repl/repl.c#L372-L376 after:

    ZeroMemory(&ReplContext, sizeof(ReplContext));

    YoriLibOutput(YORI_LIB_OUTPUT_STDOUT, _T("Arguments count: %i\n"), ArgC);

    for (i = 1; i < ArgC; i++) {

before: https://github.com/malxau/yori/blob/1ce1b7d1d0f5a7809a8f0d88ce6bbe4b98539930/repl/repl.c#L423-L430 after:

    YoriLibInitEmptyString(&EmptyString);
    ReplContext.MatchString = &ArgV[StartArg];
    if (StartArg + 1 >= ArgC) {
        ReplContext.NewString = &EmptyString;
    } else {
        ReplContext.NewString = &ArgV[StartArg + 1];
    }
    StartArg += 2;
    YoriLibOutput(YORI_LIB_OUTPUT_STDOUT, _T("Old: %y\n"), ReplContext.MatchString);
    YoriLibOutput(YORI_LIB_OUTPUT_STDOUT, _T("New: %y\n"), ReplContext.NewString);

and the output of YORI> echo "string ^"with^" quotes" | repl ^"with^" without:

Arguments count: 3
Old: with
New: without
string "without" quotes

last one is the main output of REPL itself... another attempt of echo "string ^"with^" quotes" | repl ^^"with^^" without:

Arguments count: 3
Old: ^"with^"
New: without
string "with" quotes

so lets tripple escapes... echo "string ^"with^" quotes" | repl ^^^"with^^^" without and the same answer:

Arguments count: 3
Old: ^"with^"
New: without
string "with" quotes

and finally just to be ensured... echo "string ^"with^" quotes" | repl ^" nothing:

Arguments count: 2
Old:  nothing
New:
string "with" quotes

echo "string ^"with^" quotes" | repl ^^" nothing:

Arguments count 3
Old: ^"
New: nothing
string "with" quotes

echo "string ^"with^" quotes" | repl ^^^" nothing:

Arguments count 3
Old: ^"
New: nothing
string "with" quotes

I guess this is because some internal function is processing argument before it is to delivered to ArgV with ArgC variable... since this is not exact argc and argv provided by default as far as I know in C programming language... i bet nor CMD neither YORI environment is responsible for this behaviour but something else...

And by the way... Details link... Details in details link... Just want to share this... at one time it became a revelation for me =)

P.S. It is interestengly enough how the yecho.exe parses the command line arguments... in theory this is not supposed to happen:

echo "string ^"with^" quotes"

this is supposed:

echo string^ ^"with^"^ quotes
malxau commented 3 years ago

Okay, seeing as we're doing long posts...

This all stems from one of those "people who haven't studied UNIX are doomed to reinvent it poorly" things.

In UNIX, argc/argv parsing is done by the shell process. In Windows, argc/argv parsing is done by the child process.

But, who processes shell escapes?

The advantage of the UNIX model is shell escapes can be used to indicate how to layout argc/argv. In Windows, the shell will process and remove escapes, thereby having no influence over argc/argv layout.

I've read the Colascione post before, but note it's built on a huge contradiction: it's contending that there is a standard way to parse and layout argc/argv, while simultaneously complaining that nobody does it. And if nobody does it, it's hardly a standard. When you realize that every process is doing its own parsing, it becomes clear why there never will be a standard. The post sort of implies that CommandLineToArgvW is basically a canonical implementation, except that it's up in shell32.dll, which is not where command line programs normally operate, so it's highly unlikely to ever be used in a command line program. My guess is that it exists to support ShellExecute() where the shell needs to apply a file association and reconstruct command lines, so it already needed a fair amount of logic, and it was made available for anyone else who wanted it.


With the background out of the way, let's turn back to Yori.

When I started I had a fairly ideological view that I wanted these tools to interop with other Win32 command line programs. Rather than be a self-contained ecosystem like some other shells, the goal was to make something that anyone can extend, either in-proc or out-of-proc using common idioms like argc/argv. Then I made argv a YORI_STRING, which kind of undermines that goal, but the goal is still somewhat there. What this means is that the argument parser has to be fairly stupid and not emit extended information about what it did - it doesn't communicate which arguments originally had quotes, or where an argument exists within the command line, etc. It could do this fairly easily, but it can't describe the result in argc or argv.

So, Yori has two argc/argv parsers: https://github.com/malxau/yori/blob/1ce1b7d1d0f5a7809a8f0d88ce6bbe4b98539930/lib/cmdline.c#L327 is the one that's used by external programs. It interprets quotes to decide which argument things belong in, and throws the quotes away. https://github.com/malxau/yori/blob/1ce1b7d1d0f5a7809a8f0d88ce6bbe4b98539930/libsh/parse.c#L288 is the one that's used by the shell itself. It handles escapes, indicates which arguments have quotes (to allow them to be reconstructed later), indicates which argument the cursor is on (for tab completion) and does crazy things like turn "C:\Program Files"\Win into "C:\Program Files\Win" because tab completion, while still remembering whether the terminating quote was there or not. This code is...fragile.

The issue here is not the first case where trying to use argc/argv is just too simplistic. One case that's always annoyed me is CMD's echo will preserve all spaces, but Yori's won't, because there's no way to describe a whole pile of spaces in argv. Quotes are another casualty.

I have this nasty feeling that the "real" fix is to have a parser that communicates this information into each program's entrypoint, which looks more like the shell one than the mini one. Or perhaps, since this is process global state, the information could be recorded in a hidden place where the program can ask "did arg 5 have quotes?", or "tell me the string that starts at arg 5 and goes to the end of the command line", or "tell me the raw string that includes arg 5 to arg 7 with everything in the middle", etc., which was known to the parser when it did the decomposition. For built-in modules, I already made baby steps in this direction with https://github.com/malxau/yori/blob/1ce1b7d1d0f5a7809a8f0d88ce6bbe4b98539930/lib/call.c#L725 , where the built-in module by default receives the arguments after escapes were removed but it can call back into the shell to get the original layout with escapes still present.

Somewhat related is because the shell process is decomposing everything into an extended argc/argv internally, it can also end up dropping spaces etc.

But it's certainly a big can of worms, it's not specific to any particular tool, and you'll probably find quirky behavior in non-Yori tools too just because of the general situation.

aleaksunder commented 3 years ago

It is always a pain to deal with spaces and quotes in Windows... sometimes my head is blowing with all that quirky behavior of tools and generally understand the situation! But I did not uderstand the next: YORI> echo ^" --> " and all is nice and with modified repl to see what it's has YORI> repl ^" 123 -->

Argument: 2
Old:  123
New:
No file or pipe for input

So REPL simply doesn't get the argument, and if both tools use YoriLibCmdlineToArgcArgv() why ECHO does gets the quote as argument and REPL does not?

malxau commented 3 years ago

This will be because the quote is at the end of the line, so the parser couldn't treat it as containing some future text, and just passed the quote character along instead. In the second case, the quote is followed by text so it's using that to indicate you meant " 123" (with a leading space) which is what the output is showing.

Note also that echo is a builtin, so it's using the (full) shell parser, not the mini process parser. They're two implementations with two quirky behaviors. If you ran the top command with yecho.exe, it wouldn't display anything.

Also, I think your second link contains a link that hints at how this is really working in CMD: https://docs.microsoft.com/en-us/previous-versions//17w5ykft(v=vs.85)?redirectedfrom=MSDN

In my big wall of text, I was rhetorically asking "who processes shell escapes?" on the assumption that only the shell can process shell escapes. But what this link suggests is that CMD passes along all of the escape characters, and the child process is responsible for interpreting escapes as part of its own argc/argv processing. I'm still skeptical that this really works, because it assumes the behavior of the child process, and the child process can do anything. I can change the shell to retain escape characters when invoking child programs and change all of the child programs in the Yori tree to process them...but it means a new shell process can't really operate old Yori tools since it will pass escapes that they won't consume. Maintaining compatibility between arbitrary command line generators and arbitrary command line parsers is strangely difficult.

malxau commented 3 years ago

Actually, I misread the link. It's saying that the caret is removed "by the command-line parser in the operating system" by which it means cmd.exe, and the character is not received by the child process. That's why there's this strange behavior of using a backslash before a quote to indicate an escape to child process, and a caret is used to indicate an escape to the shell.

Yuck.

aleaksunder commented 3 years ago

Oh sorry... now I get it: https://github.com/malxau/yori/blob/1ce1b7d1d0f5a7809a8f0d88ce6bbe4b98539930/echo/echo.c#L150-L154 As I understand ECHO simply do not bother about arguments at all after the last - switch and simpy outputs everything it recieves from YoriLibBuildCmdlineFromArgcArgv()

So basically this means that no external exe in Yori tools can recieve " symbol within argument due to YoriLibCmdlineToArgcArgv() mechanism

Maybe this can be achieved with YoriLibCmdlineToArgcArgv() to bother about ^ symbol before turning next symbol into BreakChar since ^ is standard escape symbol in Windows.

So "-arg ^"value^"" becomes -arg "value" argument... i guess it is simple enough to make it works but I do not have a lot of expirince and this can break things. But if I understand all correctly till now it is nearly impossible to pass " into argument of tool.exe and this means nobody does it so maybe this approach this approach has a right to life.

aleaksunder commented 3 years ago

Oh, now I get it completely! All works with echo since echo is bultin to yori.exe but not yorimin.exe... YORIMIN> echo "string ^"with^" quotes" --> string with" quotes"

and all is absolutely fine with ONEYORI: ONEYORI>echo "string ^"with^" quotes" | repl ^"with^" without --> string without quotes

This is due to bultins actually operates as a function in yori.exe... so the only way external tool.exe to get advanced arguments is to change YoriLibCmdlineToArgcArgv()

aleaksunder commented 3 years ago

And as far as I get from those articles the logic of Windows is that the user is absolutely on his own how to pass a command line, user must know what is he doing and where... so for example when you using CMD you must know how it parses the command how to escape symbols and how double quote symbol actually works. So in Windows logic basically it is not a Windows problem that you do not know a lot of complicated things =) It is your own problem... and all you need to do is make a proper command line in your environment.

The next step is a program.exe... the program is on itself too, so basically if you doing doing something wrong with such simple thing to run program with arguments it is your fault or if it not than it is program fault =) Windows is not to blame

It is fun enough but the logic here is that you are absolutely on your own... so if you write in CMD: program.exe "argument" then you must understand that program recieves argument without quotes symbol program.exe ^"argument^" -> "argument" with quotes as 1 argument program.exe ^"argu ment^" -> "argu and ment" with quotes as 2 arguments

and therefore program.exe must know what to do with that " symbols... so in Yori case YoriLibCmdlineToArgcArgv() simply strips it out saving and if this behaviour going to change in way to save symbol after ^ this can hurt someone who already wrote:

CMD> echo "string ^with^ carets" | repl.exe "^with^" without

Since echo argument becomes string with carets and repl 1st argument becomes with" and that's absolutely bad I know... but what if to imitate behaviour of the shell in such way:

What if YoriLibCmdlineToArgcArgv() will parse ^ as an escape only if it is not within double quotes section... so it is possible to write such CMD:

CMD> echo string^ ^^with^^^ carets | repl.exe ^^with^^ without`

In that way CMD user have to escape all the spaces and special symbols by hand and string^ ^^with^^^ symbols becomes string ^with^ symbols

CMD> echo string^ ^"with^"^ carets | repl.exe ^"with^" without

will work too... and nobody who has quoted arguments for echo and repl will be affected... only those who likes to escape manually =)

And even someone wrote such thing:

CMD> ycopy my^ lovely^ escaped^ path\folder\file my^ another^ lovely^ path\folder\file

It will not be affected since this is not work at all due to CMD strips all ^ and there becomes a mess... and by the way this is another issue! ycopy.exe do not understand escaped path but will if YoriLibCmdlineToArgcArgv() will be changed =) User just need to write this doubling the escapes:

CMD> ycopy my^^^ lovely^^^ escaped^^^ path\folder\file my^^^ another^^^ lovely^^^ path\folder\file

but who actually cares in this particular scenario of manual escapation =)

And since CMD and YORI shells strips out escapes by themselves before pass it to an external this little change to YoriLibCmdlineToArgcArgv() seems to me that this is not a bad idea

aleaksunder commented 3 years ago

I have an idea for all that backwards compatibility stuff...

So in a nutshell: Yori is not a CMD... it is not strictly build in to the system and not not obligated to provide bulletproof backwards compatibility since configured system is not limited by design to only one version of Yori. Keeping strong backward compatibility made CMD to what it is... not usable at all! I think that's why PowerShell was introduced

The main idea here is to pick out use cases where changes may affect previously written scripts. Well in case of Yori we have two usage scenarios:

  1. Someone as me using Yori as a portable shell. This case assumes: You put Yori in D:\shells\yori directory You put your script to D:\scripts\script.cmd or D:\scripts\script.ys1 And with "magic" of one little command like D:\shells\yori\yori.exe -c D:\scripts\script.ys1 you bring this to work So in case of updating to new version you at least have an option to create new D:\shells\yoriNew, bring an updated version here, change command to D:\shells\yoriNew\yori.exe -c D:\scripts\script.ys1 test if all is OK and replace D:\shells\yori with D:\shells\yoriNew Considering full Yori's size about 0,5-5 megabytes this is not a case at all to store two versions before fix your scripts to work with new one. So basically I just want to say no one forces anybody to update or use ypm and even somebody wants to do it there is an option to backup currently working shell and switch to it if something fails with new one.

  2. Some one is using Yori "statically" This case assumes: Yori is located in C:\yori Already written scripts contain C:\yori Maybe there is a file association in Windows registry for .ys1 files YPM is used So in this case updating to new version which is not backwards compatible may break all the stuff and your familiar environment ( such as double clicking .ys1 files to execute it )

So despite this to cases is very similar at all ( you have to place Yori somewhere and configure your environment to use it ) there is a difference with YPM usage... so let's assume you have no options and forced to use Yori in C:\yori In current setup we can not do anything with it since YPM updates and replaces outdated executables with new ones... but what if YPM will include versioning by itself? So I'm not saying that this is right just want to you consider an opportunity...

What if since Yori v2.0 will not replace already existing executables but for example create a UTC timestamped folder and subfolder and basically place new versions there and there will be:

C:\yori\20210205\02050402 - timestamp to hundredth of a second with new version
C:\yori\ypm.exe - modified version which can update v1 nearby and v2 in timestamped fashion
C:\yori\ystarter.exe - utility to which new `.ys` files can be associated
C:\yori\v1*files - all that other v1 already existing files

As I already mentioned there is not a big deal to keep 5 megabytes Yori versions... even 100 versions of Yori is something around 1 new Powershell Core size =) Default Yori script extension can be switched to .ys and MUST include first commented line which indicates for what version script is written ( maybe alias latest to use the latest available )

So basically all that stuff is to provide default behavior with the ability to keep previous versions in place so it can be used if something goes wrong with new one. After that all you need to do is to change your .ys first line to new version when you made sure everything is fine.

So at least with option for users to keep already working shell you may introduce script breaking changes with ease... you only have to notify users of those changes.

malxau commented 3 years ago

I pushed a series of changes related to this. These implement the \" escape sequence including with repeated backslashes. There's a fairly large volume of changes to do this. The internal shell argument parser interprets the backslash characters to determine where to break arguments but leaves them in place. The parser in child processes, which is also invoked before calling builtins, removes these escapes. This also brings up the Colascione logic of having to insert these escapes when invoking child processes with strings that aren't directly entered from the user.

aleaksunder commented 3 years ago

Perfect!!! =) I've just grab the source and will do some tests during this week... I'm compiling it now and just want to point one thing:

*** Compiling make
alloc.c
exec.c
make.c
minish.c
preproc.c
scope.c
target.c
var.c
exec.c(111): error C2220: warning treated as error - no 'object' file generated
exec.c(111): warning C4020: YoriLibCreateDirectoryAndParents: too many actual parameters
NMAKE : fatal error U1077: "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\cl.EXE" : return code "0x2"
Stop.

*** Linking make
exec.c
exec.c(111): error C2220: warning treated as error - no 'object' file generated
exec.c(111): warning C4020: YoriLibCreateDirectoryAndParents: too many actual parameters
NMAKE : fatal error U1077: "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\cl.EXE" : return code "0x2"
Stop.

*** Installing make
NMAKE : fatal error U1077: if : return code "0x1"
Stop.

It's all related to ymake.exe as I see and nothing seems related to all other stuff... I can do tests without any worryies.

aleaksunder commented 3 years ago

Hello! I have not done deep testing but new escape seems to work well... huge thanks for that huge amount of work you've faced!

The only noticable thing I've hit with new \" escape is rebuilding command line with already existing " symbols... so here is the scenario...

We have callee.cmd with such contents:

@ECHO %CMDCMDLINE%

this file only echoes the command line it got.

The second file is caller.cmd:

@PUSHD "%~dp0"
timethis.exe callee.cmd "-value ""with spaces"""

this file is aimed to launch callee.cmd with exact ONE parameter: -value ""with spaces""... so this whole string should be passed as %1 Double " symbols is due to CMD nature, there is no other way to pass one " in parameter to other .cmd file, you have to manually double " in caller and manually undouble in callee... but this is not the case

The thing is timethis.exe rebuilding command line and in this scenario the final callee.cmd will echo:

C:\Windows\system32\cmd.exe /c C:\test\callee.cmd "-value \"with spaces\""

because timethis.exe have rebuilded command line and callee.cmd recieved -value \"with spaces\"

I think there is a enough situations where command line that comes to external Yori utility should not be rebuilded and passed as it is.

And by the way... with rebuilding command line scenario the parameter should become:

-value \"\"with spaces\"\"

I mean doubling the escapes of "

malxau commented 3 years ago

I had a bug in the earlier change that treated "" as an escaped form of " which confused the behavior you're seeing here quite a bit.

That said, I don't think the expression will end up as one parameter by applying Visual C++'s rules. Its rules will remove all of the nonescaped quotes and use them only to indicate how to break arguments. The string "-value ""with spaces""" would become -value with spaces because the second double quote terminates the first but not the argument, the third opens a new quoted region which is closed by the fourth, and the fifth and sixth open and close a quoted region with no contents. The rules I followed were slightly different where the second quote actually terminates the argument, so the result is one argument of -value, a second argument of with spaces, and a third empty argument.

Although it sounds nice to say that there are scenarios where strings should not be manipulated, note that all of these tools need to parse the command line to find command line options and items that they should consume and remove from that combined string. In this case, timethis.exe needs to interpret and understand callee.cmd as distinct from everything that follows, so there will be a tokenization and reassembly process.

If others think they can do better, I'm accepting pull requests.

aleaksunder commented 3 years ago

Personally I don't think that I can do better... my mind is blowing when I face all this double quote stuff =) If "" treated as " is not an issue now I do not have anything to add here and going to close this thread.

I've just finished one quite complicated 2000+ lines CMD script which use Yori utilities quite a lot in many aspects, such as working with hex binaries, searching and replacing complicated strings, a lot of file/directory operations ( copy, move, delete, backup ) and want to say one thing:

It is FANTASTIC to have such a tool as Yori! It is really just fantastic to have all those utilities and be sure that my script will work on any version of Windows. Thank you very much for all this work... all this hard-hard-hard work!

P.S. There few moments that I've noticed I am missing in Yori:

  1. Something like YSTRING utility that work with strings and can return length of string and doing some stuff like counting amount of symbols in string... for example count \ in C:\temp\directory\file
  2. Command block in Yori scripts mentioned in issue#58... so the similar syntax will be possible:
    FOR %%A IN (
        1 2 3
    ) DO @(
        IF 1 > 3 (
            ECHO Not true =)
        )
    )
  3. And removing all that VirusTotal false alerts with builds... I do not know how to do it, maybe obfuscating the code before compiling github releases such 1.40 can help

Anyway this is just a wishes I believe can improve Yori... and again Thank You SO MUCH for the support!!!