ljharb / shell-quote

MIT License
24 stars 10 forks source link

Why is the equal sign escaped in args to quote() #11

Open drok opened 10 months ago

drok commented 10 months ago

When converting an array or args into a string that might be saved to a file as a shell script, equal signs are escaped with a backslash. This is not necessary and makes the result script file harder to read. The superfluous escape is not incorrect, but unnecessary, and ugly.

I wonder why this escape is done, what purpose does it serve? Here's an example:

quote(["make", "CFLAGS=-DRELEASE"])

I expected the command line to be: make CFLAGS=-DRELEASE.

Instead, I got: make CFLAGS\=-DRELEASE (the \ is not necessary)

This behaviour seem to be implemented in quote.ts:

return String(s).replace(/([A-Za-z]:)?([#!"$&'()*,:;<=>?@[\\\]^`{|}])/g, '$1\\$2');

I see on this list that other characters which have no special meaning to the shell are also quoted: '@', '^', '{' '}'

I would like to note from the manpage of bash:

Note that unlike the metacharacters ( and ), { and } are reserved words and must occur where a reserved word is permitted to be recognized.

This means the curly brackets are not metacharacters ("A character that, when unquoted, separates words. One of the following: | & ; ( ) < > space tab newline"), they need not be escaped. Example:

$ echo {this-is-fine}
{this-is-fine}

$ echo "| this ( ) is ! also & fine > really ; and 'this' ... see ?"
| this ( ) is ! also & fine > really ; and 'this' ... see ?

$ echo "<quoted means not a metacharacter>"
<quoted means not a metacharacter>

Thanks!

ljharb commented 10 months ago

This package isn't unique to bash; there's many other shells we might be used in - zsh, ksh, sh, dash, to name a few.

Given that the purpose an escaping library like this is correctness, and not readability of the output, I'm inclined to leave it as-is, since "too much escaping" is a trivial problem but breakage is a very expensive one.

drok commented 10 months ago

So is there a shell where it would be incorrect to not quote the equal sign?

The problem with "too much escaping" is that more is not necessarily better.

For example, the simple implementation at quote.js#L8-9 is wrong.

In that case, a single-quoted string is created with backslashes used to escape single quotes and backslashes. This violates POSIX.1-2017 2.2.2:

A single-quote cannot occur within single-quotes.

This case cannot occur due to the !(/'/).test(s) condition, but the replace pattern does double up the backspaces, and it shouldn't. A user supplied 3-char string \ \ becomes a 5-char string \\ \\ after this treatment, because backslashes inside single quoted strings are not escape characters, ie, they don't need escaping, and escaping them breaks the string:

$ echo '\ \'
\ \

I am not focused on correctness; I'd be happy with good looking equal signs. 😉 Correctness can be someone else's battle.

ljharb commented 10 months ago

Correctness should be everyone’s battle; it’s all that matters.

If you can provide some test cases that over-escaping breaks, I’ll be happy to fix them.

drok commented 10 months ago

Test case:

user supplied 3-char string \ \

ljharb commented 10 months ago

The output is meant to be used in double quotes, I think? echo "\\ \\" works fine.

drok commented 10 months ago

The problem is that quote(['\\ \\']), ie, the 3-char input 5c 20 5c, returns '\\ \\', ie, the 7-char sequence 27 5c 5c 20 5c 5c 27 when it should return '\ \', ie, the 5-char sequence 27 5c 20 5c 27

drok commented 10 months ago

For this case, the output "\\ \\", ie, the 7-char sequence 22 5c 5c 20 5c 5c 22 would also be correct, but less readable (a 3-char input represented in 7-char is 4 extra characters)

ljharb commented 10 months ago

right - but the output of this library needs to work in double quotes, and echo "\ \" doesn't work.

Can you elaborate on why you want readability here?

drok commented 10 months ago

why you want readability here

Eyeballs matter. Readability is useful when the output of the library is used in human-visible text like a blog, an email or a book, for instance by cut-and-pasting into a word processor.

Also, when a teacher wrote a command line on a blackboard, he would write it as echo '\ \' and not as echo "\\ \\". If the command line had any complexity, single quotes would be the tool of choice, not double quotes. A student or uninitiated reader would take the teacher's writing as authoritative over a textbook that used the more obfuscated quoting. The teacher shouldn't have to explain that some poorly written textbooks unnecessarily give the longer notation.

I use this lib to output diagnostics into a log file. Ie, when things go wrong, frustrated human eyeballs will be looking at my diagnostic output and scrutinize every byte. I don't want to add to the frustration by making that user unnecesarily interpret the more obfuscated, double-quoted notation.

If the output were only ever intended to be seen by a machine, it wouldn't matter how obfuscated it is.

ljharb commented 10 months ago

I agree that readability is very important in a general sense.

If it's for human eyeballs and not a shell parser, why run it through this library at all?

drok commented 10 months ago

I use this lib to output diagnostics into a log file

I use this lib to output diagnostics into a log file. Ie, when things go wrong, frustrated human eyeballs will be looking at my diagnostic output

ljharb commented 10 months ago

Yes, I read that, but I'm still unclear why the output of this would end up in the log file. Either you're using shell commands to put it into the log file - where the shell will remove the escaping for you - or you're not, and you can put the unescaped output directly into the log file.

(heads up that when someone asks a clarifying question, it's very rude to just repeat the things you said. if they didn't understand the first time, then you need to phrase it differently, not repeat it, for them to understand it)

drok commented 10 months ago

but I'm still unclear why the output of this would end up in the log file

The application spawns some subprocess with arguments given as an array of strings, similar to execve(executable_name, argv[]). The argv[] array in some way.

The spawning of this process is logged, and it needs to be in a format that:

  1. should be as easy as possible for the user to understand, unambiguously.
  2. should be able to be cut and pasted into a terminal window to correctly execute the same command, by first removing the escaping, then calling execve with exactly the same argv array that my application did.

Putting an unescaped command line would not allow any of these two objectives. Without the correct quoting it would be ambiguous, and could not be executed without manual tinkering.

My apologies for the need to repeat. It looked as if you had not read my entire earlier response, and asked a question that had been pre-emptively answered.

ljharb commented 10 months ago

ahh ok, thanks. so you're saying that the humans aren't looking at the log file itself, they're looking at the log of the "command that logs to the file", which has commands that need some escaping.

How would this user "remove the escaping"? You can't generally just delete all backslashes to accomplish that.

drok commented 10 months ago

How would this user "remove the escaping"? You can't generally just delete all backslashes to accomplish that.

The end user might cut and paste the logfile to stackoverflow, and hope for help from some pedantic nerd. Therefore, it must be correct, not some approximation of what was executed, to pass the scrutiny of someone skilled. Any missing or misplaced backslash will result in Ahhh... there's your problem - a well intentioned, but a wild goose chase all the same.

The end user can also attempt to re-run the command, possibly by slightly editing it, maybe to add something or remove something. Perhaps a trial-and-error process to find the exact offending argument, so they can correct the input to the application, devise some workaround, or make an informed report to the application developer. As that developer, I'm much more likely to give a swift, accurate solution when the bug reporter has already done some work to narrow down the root cause.

Just as you say, unquoting is not as simple as deleting some backslashes. On the flip side, in order to produce a working command line from an incorrect or unreadable approximation, it's not enough to add some backslashes or sprinkle some quotes. The user should not need to fix up a defective command line before cutting and pasting it into a terminal.

The end user could very well be a pedantic nerd with the command line surgery skills needed to fully diagnose and remedy the issue. Even then, I need to present in the diagnostic logfile information that is accurate and readable, to help them help themselves.

ljharb commented 10 months ago

I totally agree that for your use case, you need something that can be copypasted verbatim and run on the CLI.

Certainly, if there's any inputs we accept that when outputted and executed in double quotes, something unexpected happens, we have a bug, and that should be fixed.

If all you want is to remove some of the escapes, it seems like wrapping (not forking) this library and doing so with its output would be a pretty maintainable approach?

soletan commented 3 months ago

Correctness should be everyone’s battle; it’s all that matters.

If you can provide some test cases that over-escaping breaks, I’ll be happy to fix them.

Looks like shell-quote is causing an issue in concurrently.

Nothing's wrong with correctness, but correctness is hard to proof and shell-quote seems to more like breaking a fly on the wheel as it is rather scarce on distinguishing when to escape what. Don't get me wrong for this might sound not the way I mean it, but after checking the readme rather briefly I'm wondering if shell-quote is even meant to support cmd.exe on windows at all. Is it?

If it is, escaping at least @ this generically seems to be wrong as it is passed by cmd.exe into the invoked command e.g. when running it like this:

C:\Users\cepharum\PhpstormProjects\UID\Logistics\WebClient\Source\UiNew>cmd /s /c node log.js --tags \@foo
[
  'C:\\Program Files\\nodejs\\node.exe',
  'C:\\Users\\cepharum\\PhpstormProjects\\UID\\Logistics\\WebClient\\Source\\UiNew\\log.js',
  '--tags',
  '\\@foo'
]

with a file log.js like this:

console.dir(process.argv);
soletan commented 3 months ago

Looks like using single quotes in cmd.exe isn't working either:

C:\Users\cepharum\PhpstormProjects\UID\Logistics\WebClient\Source\UiNew>cmd /s /c @node log.js --tags \@foo
[
  'C:\\Program Files\\nodejs\\node.exe',
  'C:\\Users\\cepharum\\PhpstormProjects\\UID\\Logistics\\WebClient\\Source\\UiNew\\log.js',
  '--tags',
  '\\@foo'
]

C:\Users\cepharum\PhpstormProjects\UID\Logistics\WebClient\Source\UiNew>cmd /s /c "@node log.js --tags \@foo"
[
  'C:\\Program Files\\nodejs\\node.exe',
  'C:\\Users\\cepharum\\PhpstormProjects\\UID\\Logistics\\WebClient\\Source\\UiNew\\log.js',
  '--tags',
  '\\@foo'
]

C:\Users\cepharum\PhpstormProjects\UID\Logistics\WebClient\Source\UiNew>cmd /s /c \@node log.js --tags \@foo
Der Befehl "\@node" ist entweder falsch geschrieben oder
konnte nicht gefunden werden.

C:\Users\cepharum\PhpstormProjects\UID\Logistics\WebClient\Source\UiNew>cmd /s /c '@node log.js --tags \@foo'
Der Befehl "'@node" ist entweder falsch geschrieben oder
konnte nicht gefunden werden.

Pardon the German output but the errors are indicating that

ljharb commented 3 months ago

@soletan it does indeed seem to support windows paths (based on PRs that predate my maintainership).

If there's a bug, then a PR with a reduced test case would be most helpful, with or without a fix - it's difficult for me to test on Windows but if there's a pre-existing PR then I can fire up a VM and check it.

soletan commented 3 months ago

Thanks for the confirmation.

Since we have an existing approach to escaping special shell characters in one of our own tools which I failed to thoroughly test myself, I might spend some more time on improving on the testing part there and let you know of any results or even provide a revision.

ljharb commented 3 months ago

Thanks, that'd be great!

soletan commented 2 months ago

Okay, I've spent the weekend implementing a test trying options for escaping characters with and without wrapping them in single or double quotes. The resulting tool is generating configurations for tested shells. I've used this tool to create configurations for Windows' CMD and PowerShell. In addition, I've created a docker setup running the tool against several shells supported by Debian. As a result, there are about 10 different configuration files for as many different shells.

For testing the results, I've also implemented a slight method for quoting a single argument based on a selected configuration. As this is already working, I successfully integrated this approach with our tool I was mentioning before, too.

To conclude, tests reveal a rather diverse situation when it comes to escaping and quoting characters for use with different shells. However, I was struggling to see a way to integrate with your package

I hope you don't mind that I've decided to provide the results as a dedicated package on npm for now. Maybe, you see a way to integrate the results with your package so that other tools such as concurrently can benefit from it.

UPDATE: Maybe, as a quick example on the mentioned diversity, an argument with whitespace must be wrapped in double quotes when used with CMD while PowerShell works better there with single quotes and neither of the two supports whitespace being escaped without any quotes which in turn is fine with the Linux-based shells.

ljharb commented 2 months ago

I guess I’m a bit confused. What i was hoping for was a set of test inputs and outputs, and then results for running those outputs on lots of shells.

I don’t see a value in minmal shell-specific quoting; what I’d want is this package to quote the union of what shells need, so the output can be used everywhere.

soletan commented 2 months ago

Okay, but there is no single union option. E.g., CMD requires ^ for escaping instead of \. As mentioned before, CMD is requiring double quotes when arguments contain whitespace. It doesn't support to escape them without quotes. Powershell OTOH is benefitting more from single quotes when necessary. In addition, it is requiring single quote inside of single quoted argument to be doubled for escaping. Using backslash does not work there, too. Both Windows shells cause issues when using escape sequence on too many characters for they just keep the backslash when they don't deem it necessary. And Linux shells have a similar support for escaping, but it ain't equivalent.

So, I'm happy to see that one-for-all approach that's also correct in all possible scenarios and capable of recognizing whether some argument is safe to be passed eventually. Just have a \n in your arguments and most shells have issues with properly escaping/quoting that. In best case scenario, the \n just gets lost.

ljharb commented 2 months ago

i guess it'd be fine to pivot based on OS - but shell detection is something i'd rather avoid.

soletan commented 2 months ago

Hm, maybe there is a misunderstanding here. The package I've created does not require you to pick or even detect a shell. In fact, its default behavior is to use the shell that is Node's default according to the documentation of its spawn() function - CMD.exe on Windows and /bin/sh on Linux. It is then using the related configuration that's ready for use. So, it is in fact solely relying on the platform to choose a shell's configuration by default. And there is no need to test or detect anything other than properly processing the provided configuration.

If you don't care for the shell, all you do is ask the helper function I've created for a configuration which is a set of regular expressions and mappings of characters that you can then use to quote arguments and even detect if the provided argument is capable of being safely quoted at all. You can do this all on your own or use another helper function I've implemented.

const { getShellConfigurationSync, quote } = require( "@cepharum/quoting-db" );
const configuration = getShellConfigurationSync();

const args = [ "--opt", "some argument with space", "--filter", "!$%><" ];
const quoted = args.map( arg => quote( arg, configuration ) );