Closed aleaksunder closed 3 years ago
I guess something similar is with REPL
... nothing helps to escape "
symbol even with YORI environment:
nor
YORI> echo "string ^"with^" quotes" | repl ^"with^" without
neither
YORI> echo "string ^"with^" quotes" | repl ^^^"with^^^" without
nothing seems to work... tried to do this with source of repl.c
:
before:
https://github.com/malxau/yori/blob/1ce1b7d1d0f5a7809a8f0d88ce6bbe4b98539930/repl/repl.c#L372-L376
after:
ZeroMemory(&ReplContext, sizeof(ReplContext));
YoriLibOutput(YORI_LIB_OUTPUT_STDOUT, _T("Arguments count: %i\n"), ArgC);
for (i = 1; i < ArgC; i++) {
before: https://github.com/malxau/yori/blob/1ce1b7d1d0f5a7809a8f0d88ce6bbe4b98539930/repl/repl.c#L423-L430 after:
YoriLibInitEmptyString(&EmptyString);
ReplContext.MatchString = &ArgV[StartArg];
if (StartArg + 1 >= ArgC) {
ReplContext.NewString = &EmptyString;
} else {
ReplContext.NewString = &ArgV[StartArg + 1];
}
StartArg += 2;
YoriLibOutput(YORI_LIB_OUTPUT_STDOUT, _T("Old: %y\n"), ReplContext.MatchString);
YoriLibOutput(YORI_LIB_OUTPUT_STDOUT, _T("New: %y\n"), ReplContext.NewString);
and the output of YORI> echo "string ^"with^" quotes" | repl ^"with^" without
:
Arguments count: 3
Old: with
New: without
string "without" quotes
last one is the main output of REPL
itself... another attempt of echo "string ^"with^" quotes" | repl ^^"with^^" without
:
Arguments count: 3
Old: ^"with^"
New: without
string "with" quotes
so lets tripple escapes... echo "string ^"with^" quotes" | repl ^^^"with^^^" without
and the same answer:
Arguments count: 3
Old: ^"with^"
New: without
string "with" quotes
and finally just to be ensured... echo "string ^"with^" quotes" | repl ^" nothing
:
Arguments count: 2
Old: nothing
New:
string "with" quotes
echo "string ^"with^" quotes" | repl ^^" nothing
:
Arguments count 3
Old: ^"
New: nothing
string "with" quotes
echo "string ^"with^" quotes" | repl ^^^" nothing
:
Arguments count 3
Old: ^"
New: nothing
string "with" quotes
I guess this is because some internal function is processing argument before it is to delivered to ArgV
with ArgC
variable... since this is not exact argc
and argv
provided by default as far as I know in C programming language... i bet nor CMD neither YORI environment is responsible for this behaviour but something else...
And by the way... Details link... Details in details link... Just want to share this... at one time it became a revelation for me =)
P.S.
It is interestengly enough how the yecho.exe
parses the command line arguments... in theory this is not supposed to happen:
echo "string ^"with^" quotes"
this is supposed:
echo string^ ^"with^"^ quotes
Okay, seeing as we're doing long posts...
This all stems from one of those "people who haven't studied UNIX are doomed to reinvent it poorly" things.
In UNIX, argc/argv parsing is done by the shell process. In Windows, argc/argv parsing is done by the child process.
But, who processes shell escapes?
The advantage of the UNIX model is shell escapes can be used to indicate how to layout argc/argv. In Windows, the shell will process and remove escapes, thereby having no influence over argc/argv layout.
I've read the Colascione post before, but note it's built on a huge contradiction: it's contending that there is a standard way to parse and layout argc/argv, while simultaneously complaining that nobody does it. And if nobody does it, it's hardly a standard. When you realize that every process is doing its own parsing, it becomes clear why there never will be a standard. The post sort of implies that CommandLineToArgvW is basically a canonical implementation, except that it's up in shell32.dll, which is not where command line programs normally operate, so it's highly unlikely to ever be used in a command line program. My guess is that it exists to support ShellExecute() where the shell needs to apply a file association and reconstruct command lines, so it already needed a fair amount of logic, and it was made available for anyone else who wanted it.
With the background out of the way, let's turn back to Yori.
When I started I had a fairly ideological view that I wanted these tools to interop with other Win32 command line programs. Rather than be a self-contained ecosystem like some other shells, the goal was to make something that anyone can extend, either in-proc or out-of-proc using common idioms like argc/argv. Then I made argv a YORI_STRING, which kind of undermines that goal, but the goal is still somewhat there. What this means is that the argument parser has to be fairly stupid and not emit extended information about what it did - it doesn't communicate which arguments originally had quotes, or where an argument exists within the command line, etc. It could do this fairly easily, but it can't describe the result in argc or argv.
So, Yori has two argc/argv parsers:
https://github.com/malxau/yori/blob/1ce1b7d1d0f5a7809a8f0d88ce6bbe4b98539930/lib/cmdline.c#L327 is the one that's used by external programs. It interprets quotes to decide which argument things belong in, and throws the quotes away.
https://github.com/malxau/yori/blob/1ce1b7d1d0f5a7809a8f0d88ce6bbe4b98539930/libsh/parse.c#L288 is the one that's used by the shell itself. It handles escapes, indicates which arguments have quotes (to allow them to be reconstructed later), indicates which argument the cursor is on (for tab completion) and does crazy things like turn "C:\Program Files"\Win
into "C:\Program Files\Win"
because tab completion, while still remembering whether the terminating quote was there or not. This code is...fragile.
The issue here is not the first case where trying to use argc/argv is just too simplistic. One case that's always annoyed me is CMD's echo will preserve all spaces, but Yori's won't, because there's no way to describe a whole pile of spaces in argv. Quotes are another casualty.
I have this nasty feeling that the "real" fix is to have a parser that communicates this information into each program's entrypoint, which looks more like the shell one than the mini one. Or perhaps, since this is process global state, the information could be recorded in a hidden place where the program can ask "did arg 5 have quotes?", or "tell me the string that starts at arg 5 and goes to the end of the command line", or "tell me the raw string that includes arg 5 to arg 7 with everything in the middle", etc., which was known to the parser when it did the decomposition. For built-in modules, I already made baby steps in this direction with https://github.com/malxau/yori/blob/1ce1b7d1d0f5a7809a8f0d88ce6bbe4b98539930/lib/call.c#L725 , where the built-in module by default receives the arguments after escapes were removed but it can call back into the shell to get the original layout with escapes still present.
Somewhat related is because the shell process is decomposing everything into an extended argc/argv internally, it can also end up dropping spaces etc.
But it's certainly a big can of worms, it's not specific to any particular tool, and you'll probably find quirky behavior in non-Yori tools too just because of the general situation.
It is always a pain to deal with spaces and quotes in Windows... sometimes my head is blowing with all that quirky behavior of tools and generally understand the situation!
But I did not uderstand the next:
YORI> echo ^"
--> "
and all is nice
and with modified repl
to see what it's has
YORI> repl ^" 123
-->
Argument: 2
Old: 123
New:
No file or pipe for input
So REPL
simply doesn't get the argument, and if both tools use YoriLibCmdlineToArgcArgv()
why ECHO
does gets the quote as argument and REPL
does not?
This will be because the quote is at the end of the line, so the parser couldn't treat it as containing some future text, and just passed the quote character along instead. In the second case, the quote is followed by text so it's using that to indicate you meant " 123" (with a leading space) which is what the output is showing.
Note also that echo is a builtin, so it's using the (full) shell parser, not the mini process parser. They're two implementations with two quirky behaviors. If you ran the top command with yecho.exe, it wouldn't display anything.
Also, I think your second link contains a link that hints at how this is really working in CMD: https://docs.microsoft.com/en-us/previous-versions//17w5ykft(v=vs.85)?redirectedfrom=MSDN
In my big wall of text, I was rhetorically asking "who processes shell escapes?" on the assumption that only the shell can process shell escapes. But what this link suggests is that CMD passes along all of the escape characters, and the child process is responsible for interpreting escapes as part of its own argc/argv processing. I'm still skeptical that this really works, because it assumes the behavior of the child process, and the child process can do anything. I can change the shell to retain escape characters when invoking child programs and change all of the child programs in the Yori tree to process them...but it means a new shell process can't really operate old Yori tools since it will pass escapes that they won't consume. Maintaining compatibility between arbitrary command line generators and arbitrary command line parsers is strangely difficult.
Actually, I misread the link. It's saying that the caret is removed "by the command-line parser in the operating system" by which it means cmd.exe, and the character is not received by the child process. That's why there's this strange behavior of using a backslash before a quote to indicate an escape to child process, and a caret is used to indicate an escape to the shell.
Yuck.
Oh sorry... now I get it:
https://github.com/malxau/yori/blob/1ce1b7d1d0f5a7809a8f0d88ce6bbe4b98539930/echo/echo.c#L150-L154
As I understand ECHO
simply do not bother about arguments at all after the last -
switch and simpy outputs everything it recieves from YoriLibBuildCmdlineFromArgcArgv()
So basically this means that no external exe
in Yori tools can recieve "
symbol within argument due to YoriLibCmdlineToArgcArgv()
mechanism
Maybe this can be achieved with YoriLibCmdlineToArgcArgv()
to bother about ^
symbol before turning next symbol into BreakChar
since ^
is standard escape symbol in Windows.
So "-arg ^"value^""
becomes -arg "value"
argument... i guess it is simple enough to make it works but I do not have a lot of expirince and this can break things.
But if I understand all correctly till now it is nearly impossible to pass "
into argument of tool.exe
and this means nobody does it so maybe this approach this approach has a right to life.
Oh, now I get it completely!
All works with echo
since echo
is bultin to yori.exe
but not yorimin.exe
...
YORIMIN> echo "string ^"with^" quotes"
--> string with" quotes"
and all is absolutely fine with ONEYORI
:
ONEYORI>echo "string ^"with^" quotes" | repl ^"with^" without
--> string without quotes
This is due to bultins actually operates as a function in yori.exe
... so the only way external tool.exe
to get advanced arguments is to change YoriLibCmdlineToArgcArgv()
And as far as I get from those articles the logic of Windows is that the user is absolutely on his own how to pass a command line, user must know what is he doing and where... so for example when you using CMD
you must know how it parses the command how to escape symbols and how double quote symbol actually works. So in Windows logic basically it is not a Windows problem that you do not know a lot of complicated things =) It is your own problem... and all you need to do is make a proper command line in your environment.
The next step is a program.exe
... the program is on itself too, so basically if you doing doing something wrong with such simple thing to run program with arguments it is your fault or if it not than it is program fault =) Windows is not to blame
It is fun enough but the logic here is that you are absolutely on your own... so if you write in CMD:
program.exe "argument"
then you must understand that program recieves argument
without quotes symbol
program.exe ^"argument^"
-> "argument"
with quotes as 1 argument
program.exe ^"argu ment^"
-> "argu
and ment"
with quotes as 2 arguments
and therefore program.exe
must know what to do with that "
symbols... so in Yori case YoriLibCmdlineToArgcArgv()
simply strips it out saving and if this behaviour going to change in way to save symbol after ^
this can hurt someone who already wrote:
CMD> echo "string ^with^ carets" | repl.exe "^with^" without
Since echo
argument becomes string with carets
and repl
1st argument becomes with"
and that's absolutely bad I know... but what if to imitate behaviour of the shell in such way:
What if YoriLibCmdlineToArgcArgv()
will parse ^
as an escape only if it is not within double quotes section... so it is possible to write such CMD:
CMD> echo string^ ^^with^^^ carets | repl.exe ^^with^^ without`
In that way CMD user have to escape all the spaces and special symbols by hand and string^ ^^with^^^ symbols
becomes string ^with^ symbols
CMD> echo string^ ^"with^"^ carets | repl.exe ^"with^" without
will work too... and nobody who has quoted arguments for echo
and repl
will be affected... only those who likes to escape manually =)
And even someone wrote such thing:
CMD> ycopy my^ lovely^ escaped^ path\folder\file my^ another^ lovely^ path\folder\file
It will not be affected since this is not work at all due to CMD strips all ^
and there becomes a mess... and by the way this is another issue! ycopy.exe
do not understand escaped path but will if YoriLibCmdlineToArgcArgv()
will be changed =) User just need to write this doubling the escapes:
CMD> ycopy my^^^ lovely^^^ escaped^^^ path\folder\file my^^^ another^^^ lovely^^^ path\folder\file
but who actually cares in this particular scenario of manual escapation =)
And since CMD and YORI shells strips out escapes by themselves before pass it to an external this little change to YoriLibCmdlineToArgcArgv()
seems to me that this is not a bad idea
I have an idea for all that backwards compatibility stuff...
So in a nutshell: Yori is not a CMD... it is not strictly build in to the system and not not obligated to provide bulletproof backwards compatibility since configured system is not limited by design to only one version of Yori. Keeping strong backward compatibility made CMD to what it is... not usable at all! I think that's why PowerShell was introduced
The main idea here is to pick out use cases where changes may affect previously written scripts. Well in case of Yori we have two usage scenarios:
Someone as me using Yori as a portable shell.
This case assumes:
You put Yori in D:\shells\yori
directory
You put your script to D:\scripts\script.cmd
or D:\scripts\script.ys1
And with "magic" of one little command like D:\shells\yori\yori.exe -c D:\scripts\script.ys1
you bring this to work
So in case of updating to new version you at least have an option to create new D:\shells\yoriNew
, bring an updated version here, change command to D:\shells\yoriNew\yori.exe -c D:\scripts\script.ys1
test if all is OK and replace D:\shells\yori
with D:\shells\yoriNew
Considering full Yori's size about 0,5-5 megabytes this is not a case at all to store two versions before fix your scripts to work with new one.
So basically I just want to say no one forces anybody to update or use ypm
and even somebody wants to do it there is an option to backup currently working shell and switch to it if something fails with new one.
Some one is using Yori "statically"
This case assumes:
Yori is located in C:\yori
Already written scripts contain C:\yori
Maybe there is a file association in Windows registry for .ys1
files
YPM is used
So in this case updating to new version which is not backwards compatible may break all the stuff and your familiar environment ( such as double clicking .ys1
files to execute it )
So despite this to cases is very similar at all ( you have to place Yori somewhere and configure your environment to use it ) there is a difference with YPM usage... so let's assume you have no options and forced to use Yori in C:\yori
In current setup we can not do anything with it since YPM updates and replaces outdated executables with new ones... but what if YPM will include versioning by itself?
So I'm not saying that this is right just want to you consider an opportunity...
What if since Yori v2.0 will not replace already existing executables but for example create a UTC timestamped folder and subfolder and basically place new versions there and there will be:
C:\yori\20210205\02050402 - timestamp to hundredth of a second with new version
C:\yori\ypm.exe - modified version which can update v1 nearby and v2 in timestamped fashion
C:\yori\ystarter.exe - utility to which new `.ys` files can be associated
C:\yori\v1*files - all that other v1 already existing files
As I already mentioned there is not a big deal to keep 5 megabytes Yori versions... even 100 versions of Yori is something around 1 new Powershell Core size =)
Default Yori script extension can be switched to .ys
and MUST include first commented line which indicates for what version script is written ( maybe alias latest
to use the latest available )
So basically all that stuff is to provide default behavior with the ability to keep previous versions in place so it can be used if something goes wrong with new one.
After that all you need to do is to change your .ys
first line to new version when you made sure everything is fine.
So at least with option for users to keep already working shell you may introduce script breaking changes with ease... you only have to notify users of those changes.
I pushed a series of changes related to this. These implement the \"
escape sequence including with repeated backslashes. There's a fairly large volume of changes to do this. The internal shell argument parser interprets the backslash characters to determine where to break arguments but leaves them in place. The parser in child processes, which is also invoked before calling builtins, removes these escapes. This also brings up the Colascione logic of having to insert these escapes when invoking child processes with strings that aren't directly entered from the user.
Perfect!!! =) I've just grab the source and will do some tests during this week... I'm compiling it now and just want to point one thing:
*** Compiling make
alloc.c
exec.c
make.c
minish.c
preproc.c
scope.c
target.c
var.c
exec.c(111): error C2220: warning treated as error - no 'object' file generated
exec.c(111): warning C4020: YoriLibCreateDirectoryAndParents: too many actual parameters
NMAKE : fatal error U1077: "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\cl.EXE" : return code "0x2"
Stop.
*** Linking make
exec.c
exec.c(111): error C2220: warning treated as error - no 'object' file generated
exec.c(111): warning C4020: YoriLibCreateDirectoryAndParents: too many actual parameters
NMAKE : fatal error U1077: "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\cl.EXE" : return code "0x2"
Stop.
*** Installing make
NMAKE : fatal error U1077: if : return code "0x1"
Stop.
It's all related to ymake.exe
as I see and nothing seems related to all other stuff... I can do tests without any worryies.
Hello! I have not done deep testing but new escape seems to work well... huge thanks for that huge amount of work you've faced!
The only noticable thing I've hit with new \"
escape is rebuilding command line with already existing "
symbols... so here is the scenario...
We have callee.cmd
with such contents:
@ECHO %CMDCMDLINE%
this file only echoes the command line it got.
The second file is caller.cmd
:
@PUSHD "%~dp0"
timethis.exe callee.cmd "-value ""with spaces"""
this file is aimed to launch callee.cmd
with exact ONE parameter: -value ""with spaces""
... so this whole string should be passed as %1
Double "
symbols is due to CMD nature, there is no other way to pass one "
in parameter to other .cmd
file, you have to manually double "
in caller and manually undouble in callee... but this is not the case
The thing is timethis.exe
rebuilding command line and in this scenario the final callee.cmd
will echo:
C:\Windows\system32\cmd.exe /c C:\test\callee.cmd "-value \"with spaces\""
because timethis.exe
have rebuilded command line and callee.cmd
recieved -value \"with spaces\"
I think there is a enough situations where command line that comes to external Yori utility should not be rebuilded and passed as it is.
And by the way... with rebuilding command line scenario the parameter should become:
-value \"\"with spaces\"\"
I mean doubling the escapes of "
I had a bug in the earlier change that treated ""
as an escaped form of "
which confused the behavior you're seeing here quite a bit.
That said, I don't think the expression will end up as one parameter by applying Visual C++'s rules. Its rules will remove all of the nonescaped quotes and use them only to indicate how to break arguments. The string "-value ""with spaces"""
would become -value with spaces
because the second double quote terminates the first but not the argument, the third opens a new quoted region which is closed by the fourth, and the fifth and sixth open and close a quoted region with no contents. The rules I followed were slightly different where the second quote actually terminates the argument, so the result is one argument of -value
, a second argument of with spaces
, and a third empty argument.
Although it sounds nice to say that there are scenarios where strings should not be manipulated, note that all of these tools need to parse the command line to find command line options and items that they should consume and remove from that combined string. In this case, timethis.exe
needs to interpret and understand callee.cmd
as distinct from everything that follows, so there will be a tokenization and reassembly process.
If others think they can do better, I'm accepting pull requests.
Personally I don't think that I can do better... my mind is blowing when I face all this double quote stuff =)
If ""
treated as "
is not an issue now I do not have anything to add here and going to close this thread.
I've just finished one quite complicated 2000+ lines CMD script which use Yori utilities quite a lot in many aspects, such as working with hex binaries, searching and replacing complicated strings, a lot of file/directory operations ( copy, move, delete, backup ) and want to say one thing:
It is FANTASTIC to have such a tool as Yori! It is really just fantastic to have all those utilities and be sure that my script will work on any version of Windows. Thank you very much for all this work... all this hard-hard-hard work!
P.S. There few moments that I've noticed I am missing in Yori:
YSTRING
utility that work with strings and can return length of string and doing some stuff like counting amount of symbols in string... for example count \
in C:\temp\directory\file
FOR %%A IN (
1 2 3
) DO @(
IF 1 > 3 (
ECHO Not true =)
)
)
1.40
can helpAnyway this is just a wishes I believe can improve Yori... and again Thank You SO MUCH for the support!!!
In Yori environment all works perfectly:
YORI> ECHO -n -- -argument1 "value 1"
-->-argument1 value 1
this is not what we want but quite expected. Here we need to escape our"
symbolsYORI> ECHO -n -- -argument1 ^"value 1^"
-- >-argument1 "value 1"
and all is nice and flawless...But something weird is going on when I try to use
yecho.exe
with CMD:CMD> yecho.exe -n -- -argument1 "value 1"
-->-argument1 value 1
Nothing special... all as expectedCMD> yecho.exe -n -- -argument1 ^"value 1^"
-->-argument1 value 1
Ok... the thing here is CMD by itself handled^"
symbols andyecho.exe
recieved-argument1 "value 1"
as in previous attempt and nothing suprisingly new here... so in theory we just need to add additional escape for^
symbol:CMD> yecho.exe -n -- -argument1 ^^^"value 1^^^"
-->-argument1 ^"value 1^"
And that is completely unxpected behavior... let's even try what is not supposed to be:CMD> yecho.exe -n -- -argument1 ^^"value 1^^"
-->-argument1 ^"value 1^^"
And I have tried a lot of things to makeyecho.exe
work with CMD and double quotes, but nothing was successful... the only thing what's worked is such construction:CMD> yori.exe -nouser -c ECHO -n -- -argument1 ^^^"value 1^^^"
-->-argument1 "value 1"
And this is quite expected and absolutely normal...I understand that this is quite a headache with all that double quote "magic" in Windows and CMD specially and do not expect a quick reaction... Just want to ask: Am I missing something here? Maybe you can give an advice how to properly escape the double quotes so
yecho.exe
will work with CMD as it works in YORI?