milahu / nixpkgs

Nix Packages collection
MIT License
0 stars 0 forks source link

lib.escapeShellArg should return $'...' ANSI-C quoted strings #45

Closed milahu closed 2 months ago

milahu commented 2 months ago

for special characters like newlines

actual

nix-repl> lib.escapeShellArg "a\nb"
"'a\nb'"

nix-repl> lib.strings.toShellVar "x" "a\nb"
"x='a\nb'"

expected: prefix the '...' strings with $ to $'...' strings because 'a\nb' != $'a\nb'

$ echo 'a\nb'
a\nb

$ echo $'a\nb'
a
b

How does the leading dollar sign affect single quotes in Bash?

$' is a special syntax (fully explained here) which enables ANSI-C string processing.

ANSI-C Quoting (Bash manual)

3.1.2.4 ANSI-C Quoting Character sequences of the form $’string’ are treated as a special kind of single quotes. The sequence expands to string, with backslash-escaped characters in string replaced as specified by the ANSI C standard.

What does it mean to have a $"dollarsign-prefixed string" in a script?

There are two different things going on here, both documented in the `bash` manual `$'` Dollar-sign single quote is a special form of quoting: [ANSI C Quoting](http://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html) > Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard. `$"` Dollar-sign double-quote is for localization: [Locale translation](http://www.gnu.org/software/bash/manual/html_node/Locale-Translation.html) > A double-quoted string preceded by a dollar sign (‘$’) will cause the string to be translated according to the current locale. If the current locale is C or POSIX, the dollar sign is ignored. If the string is translated and replaced, the replacement is double-quoted.

similar but different issue shellAliases are not properly escaped (also escapeShellArg does not handle newlines) nixpkgs#25143

Actually in this case they aren't newlines — they aren't handled by the shell parsing, but by the program (or builtin) the text is getting passed to, as you can see by running echo '\n'. So what's happening here is that nix is interpreting the escape sequence too early, and you want to pass the literal characters \ and n to time — so you'll want to put \\n in the nix expression.

similar but different issue Need help understanding how to escape special characters in the list of str type

Escaping rules are described in the [Nix manual 485](https://nixos.org/manual/nix/stable/#ssec-values). For `''` strings: > Since `${` and `''` have special meaning in indented strings, you need a way to quote them. `$` can be escaped by prefixing it with `''` (that is, two single quotes), i.e., `''$` . `''` can be escaped by prefixing it with `'` , i.e., `'''` . `$` removes any special meaning from the following `$` . Linefeed, carriage-return and tab characters can be written as `''\n` , `''\r` , `''\t` , and `''\` escapes any other character. You should replace `${ref}` with `''${ref}` in your example, so that it does not try to interpolate a Nix variable `ref`. The sequence `=>` is not special. Unfortunately the escaping rules are different from double-quote `"` strings, where `\` is used to escape `${` instead.

python shlex does it wrong, too...

$ python
>>> import shlex
>>> shlex.quote("a\nb")
"'a\nb'"
>>> shlex.join(["a\nb"])
"'a\nb'"

javascript shlex does it wrong, too... (https://github.com/rgov/node-shlex/issues/27)

$ cd $(mktemp -d)
$ npm init -y
$ npm install shlex
$ node
> const shlex = require("shlex")
> shlex.quote("a\nb")
"'a\nb'"
> shlex.join(["a\nb"])
"'a\nb'"

im starting to notice a pattern... : P

when all are doing it "wrong" then its probably safe to always prepend a $ dollar sign to the result

but wait:

https://github.com/rgov/node-shlex

As of version 2.0.0, Bash's [ANSI C strings](https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html) (`$'x'`) and [locale-specific translation strings](https://www.gnu.org/software/bash/manual/html_node/Locale-Translation.html) (`$"x"`) are supported. This diverges from the Python shlex behavior but makes parsing more accurate.
$ grep shlex package.json
    "shlex": "^2.1.2"

but this affects only shlex.split

> shlex.split("$'a\nb'")
[ 'a\nb' ]
milahu commented 2 months ago

im starting to notice a pattern... : P

aah! nevermind, im an idiot ^^ i got confused by the repl, which shows quoted strings...

nix-repl> lib.escapeShellArg (builtins.concatStringsSep "\n" [ "a" "b" ]) == "'a\nb'"
true