Closed andychu closed 4 years ago
Should it be off by default, but on in oil:basic (and oil:all ?)
IMHO It should be on by default (with an error message showing how to turn off) in interactive mode; and a non-intrusive mode(e.g. print a warning to stderr) in batch mode.
I've been bitten by this several times (I downloaded files with curl
with the wrong options... which led me creating a file names that looked like switches. Removing them was a disaster 😢), and I believe no one would intentionally make a file that looks like a switch (and people who do would just turn off the option).
Note from lobsters: it would also be possible to treat *
like ./*
. But I think that's a little too magic, and the user should do it themselves.
The problem with this in my opinion (IIUC that it is intended to work normally unless there's a file named -r
, when it quits), is that if I wrote a script with regular *
(as I would usually do out of [bad] habit and it being easiest), it would, for me and most users, just silently work "correctly". (Also, during reviews: "Hey, stop being picky with your `./nonsense, it just works, nobody will create file named
-r, don't be silly, we've got Business to Do."*) Then super rarely would it suddenly stop working (i.e. "break" for users), in some weird accidental/malicious case when someone created a file named
-I-like-kebabs!-or submitted a file named
-rf /through a web form (for "lulz" or vuln searching). And this code path (where the script stops) would thus actually not be very well tested, not well understood, could lead to weird hard to debug errors (esp. on remote systems where logs may be hard to access, etc.), or in fact vulns (because poorly tested path). Whereas a feature that *loudly breaks upfront* (when writing the script) — e.g. the interpreter refusing to accept undecorated
rm `, complaining that "error: it must be either rm ./*
or rm -- unsafe(*)
" — would lead to the programmer being forced to fix the code quickly and the script being hopefully more robust later. As much as I am with you on anti-clippy, if we're into adages, I do also like the one of "make correct be easy, make unsafe be tedious but possible"*...
Thanks for caring enough to actually discuss this! ❤️
I agree that breaking 1 out of 1000 times is generally bad. However, I claim it's strictly better than executing with the wrong flag 1 out of 1000 times! This breakage is alerting to you that something bad would have happened.
I'm open to alternatives.
But I don't like the implicit ./*
behavior for a few reasons:
rm -- *
to rm ./*
because it is explicit. When I read rm -r -f -- *
, I read "the stuff to the left of --
is code, and the stuff to the right is data". rm ./*
is too "clever", even when done explicitly. And done implicitly it's very much too clever.myarray=( foo bar * )
do? With the implicit ./
, I see problems with either behavior (adding it or not). I'm also not sure how much people use rm *
in shell scripts. One way to quantify it would be to go through the wild corpus:
https://www.oilshell.org/release/0.7.pre7/test/wild.wwz/index.html
I think it's more commonly an interactive command, so failing fast is good there. In my shell scripts I usually write rm -r -f $dir
. (Although of course that has the same problem if $dir
is attacker-controlled.)
Also, I think a very simple static analysis rule would suffice as well. However there is no Oil lint tool at the moment. I'm inclined to implement the safer_glob
behavior because it's easy and I think very effective in the 99% case. And then if people don't like that it blows up 1 in 1000 times, the answer is to write a lint tool with Oil's AST. (Or maybe ShellCheck does it, etc.)
I also think the rare failure isn't that bad because I would want to take an additional action: clean up the files named -rf
and *
on the file system!
To me, that's just as much of a "bug" as the shell script. How did those files get there?
I don't want them sitting around, waiting for some other script to trip over them.
I don't think this bug happens all that often. Notice that the original blog post doesn't reference a real situation -- it's something made up to show the theoretical problem.
This problem doesn't happen often unintentionally, but it's a great attack because people don't expect to be attacked by filenames.
But I think there's a simpler, straightforward rule: If a glob produces a result, every expansion that doesn't begin with "/" or "./", will have a prepended "./". Since it will always be there for relative paths, people will immediately realize it's getting added and fix errors if they make wrong assumptions. This is way better than trying to have "magic --"; not all commands support "--", and some commands have multiple "--" entries so magicking after one "--" does not do the trick (e.g., git commit does this).
Hm so are you suggesting if you have:
$ ls
foo.py bar.py
then all of these:
echo *.py
rm *.py
for x in *.py; do echo $x; done
a=(*.py)
will expand to ./foo.py
and ./bar.py
?
That seems aggressive but I just want to be clear on the suggestion.
@andychu - yes, that's exactly what I propose, and I don't think it's aggressive at all.
Here are some essays that recommend manually prepending "./" to globs:
This would simply automate best practice, so that the shell always does "the right thing" that always works. Prepending "./" to non-absolute pathnames completely eliminates the "leading dash" problem for all programs, no exceptions, even if the program doesn't implement "--" or supports multiple "--" arguments.
I don't think prefixing "./" should be considered "extra magic". If you're asking for a glob, you should get back the matching paths. If you don't escape the leading "-"s, almost every program will mis-interpret them as options.
Shellcheck already warns if you use a bare glob. SC2035 requires that you say "command -- glob" or "command ./glob". Supporting both approaches is another way to do it, though that is more magical. See here: https://github.com/koalaman/shellcheck/wiki/SC2035
It'd be nice if people could just "write the easy code" and for it to "just work".
Note: if you follow the practices recommended by others, and always write relative globs by prefixing them "./", the result would be unchanged. The difference is that you would not have to prefix them.
I think the auto-./
solution is also making some assumptions that aren't valid:
./foo.py
means the same thing to every program that foo.py
does. That's true if you pass it directly to open()
, but I can think of many cases where it isn't truesha1sum
and so forth, and this would break that.One example is something like:
$ for name in *.txt; do echo $base_url/$name; done
Now http://example.com/./foo.txt
is probably equivalent to http://example.com/foo.txt
most of the time, but it's not something I want to silently be inserted by the shell.
I thought that rsync also has funny parsing of paths, though I would have to test. It treats a trailing slash differently at least.
The rule has the benefit of being easy to implement. So I could imagine it being in Oil -- the question is if it's in one of the "suggested" option groups, e.g. shopt -s strict:all
, oil:basic
, and oil:all
(equivalent to bin/oil
).
It's also hard to think of any situation where a glob like *.py
breaks. For example:
$ touch -- -rf.py
$ rm *.py
rm: invalid option -- '.'
Try 'rm --help' for more information.
So I feel like it's a big hammer, applying to essentially all scripts to solve a problem that doesn't happen much. By default, I'd lean toward detecting the error dynamically, exactly when it happens.
It is true that the shell doesn't have knowledge of when a command accepts --
. It mostly works but there are important exceptions.
But you can say the same for assuming foo.py
means the same thing as ./foo.py
-- it mostly works but there are important exceptions.
And I'd also repeat the point is that if I have a file called -rf
on my file system, I probably want to know about it. If it's silently passed as ./-rf
to every program, I may not notice, and it will stay there for a long time.
Because every OTHER shell you use on that system can be fooled by the file. I hope Oil will be the default shell on some systems but that probably won't happen for a long time.
Actually the shellcheck page even says it:
Note that changing to ./ in GNU Tar parameters will add ./ prefix to path names in the created archive. This may cause subtle problems (eg. to search for a specific file in archive, the ./ prefix must be specified as well). So using -- * is a safer fix for GNU Tar commands.
https://github.com/koalaman/shellcheck/wiki/SC2035
I use globs with tar a lot.
On second thought there's also some subtlety to the rule:
rm *
rm *.py
rm foo/*.py # can't possibly be an option
rm ?? # can be an option
rm ?[a-z] # can be
rm [a-z]? # can't be
Doing it even for foo/*.py
seems too aggressive. I think even *.py
is too aggressive but foo/*.py
is even more aggressive because you'd be changing the output for something that will never happen. There's probably a rule you can come up with to distinguish the cases but it's probably more code than it seems at first.
I also think using ShellCheck with Oil makes sense so if it can prompt users to add ./
or --
explicitly, that's good. The dynamic check in Oil can be seen as sort of a last resort.
Although it has some similar subtleties, e.g. what if you want to do something like rm -*.txt
-- all text files starting with a dash.
That said, my broader goal is to keep people from getting hurt when some malicious person creates a filename beginning with "-". Any solution that does that would be okay, though some are (in my eyes) nicer than others.
In practice I tell people to only prefix "./" if the glob pattern starts with a wildcard e.g., *.pdf
, so only prepending "./" when it begins with *
, ?
, or a brace pattern using them would be just fine, and that'd be a very simple rule.
While I have concerns, prepending only pathnames that when expanded begin with "-" would at least protect people. You could even suppress that if the shell noticed that "--" preceded it; that's a lot of magic, but it would be practically invisible in most cases.
BTW, "rm ./-*.txt` is a perfectly fine way to remove all filenames beginning with "-" (even now).
I can imagine some logic like:
-
./
, or cause a hard failure. This can be configurableAt first, I thought that the first condition isn't trivial. It's easy to test if the word begins with *
or ?
, but [a-z-]
or [a-z\-]
and extended globs like @(?|[a-z])
are hard to parse.
"Parsing that's not correct" exactly what I'm trying to avoid in Oil, e.g. http://www.oilshell.org/blog/2020/01/history-and-completion.html
But I remembered there is a LooksLikeGlob()
function that determines whether a word is passed to glob()
. I believe most shells do this -- they parse globs approximately, and then they pass the string to libc, which parses it again.
That is annoying to me but so far it hasn't caused any problems, and it's consistent with other shells. So maybe that logic can be reused for this heuristic.
There is already a GLOBIGNORE
var, so I guess there could be
SAFEGLOB=prepend
or SAFEGLOB=fail
Or we could extend shopt
to shopt -s safeglob=prepend
or shopt -s safeglob=fail
The issue of --
is still there. I would be interested in what non-GNU utils do. Busybox seems to respect it even though it does NOT support long flags:
$ busybox grep -- -a gears.js
ctx.rotate(-a*n1/n2 + Math.PI*(0.5));
ctx.rotate(-ang);
To clarify:
foo/*.py
-- LooksLikeGlob() is true, but it will never return a string that begins with -
, so we're OK*.py
-- LooksLikeGlob() is also true, and this will return -rf.py
. However that can never cause a problem for any that doesn't take -.
as a flag (which is essentially no program AFAICT)*
-- also true, this is the problematic case. Arguably *py
and *sh
are in this category too.And
./*
or ./*.py
-- even if SAFEGLOB=fail
, this will never cause a failure. So we can just preserve the advice to prepend ./
for OTHER shells, and use ShellCheck, etc. But Oil can fail as a last resort.So actually I think the most minimally invasive thing is to have shopt -s glob_dash_fail
on by default. If you don't want random failures, then you can just follow the old advice of prepending ./
! Which works for all other shells too.
shopt -s glob_dash_fail
touch -- -rf
rm * # will fail instead of passing a bad arg to rm
rm ./* # will do the right thing
rm -- * # hm I guess I don't want this to fail, which leads us back to detecting -- again
So I guess the question is, are there any command foo
where
foo -- -v
means -v
is a flag and not an arg? I don't think that's true for git commit.
I think I said this already, but aesthetically I prefer rm -- *
to rm ./*
(and I do it in my own shell scripts)
The former means exactly what it says -- everything after --
is an arg, not a flag.
./*
causes me to "skip a beat" -- what's the intention? The connection between the syntax and the intended semantics isn't clear IMO. It's a bit clever.
Third thought, parsing --
is dangerous because:
touch -- -rf
sudo -- rm * # still deletes everything!!! -- applies to sudo and not rm
correct:
sudo -- rm -- *
sudo rm -- *
gah this issue is tough. So I guess this is an example where ./
has an advantage, because it doesn't care if a command is wrapped by sudo. However it has the big disadvantage of tar
, URLs, etc.
Testing tar:
$ tar --create --file one.tar -- -rf
$ tar --create --file two.tar -- ./-rf
$ ls
-rf one.tar two.tar
$ tar --list < one.tar
-rf
$ tar --list < two.tar
./-rf
The path name is a literal string in the tar file.
The POSIX specification (2018) - Base Definitions - section 12 - guideline 10 recommends that in general, "The first -- argument that is not an option-argument should be accepted as a delimiter indicating the end of options. Any following arguments should be treated as operands, even if they begin with the '-' character.". A few programs don't do that, most famously echo. But even programs that don't accept long options normally do accept "--" as "end of options", since this is specified in POSIX.
I think you can omit (or not add) leading "./" if only the previous argument just before the glob was "--", as long as it's immediately previous. That is, given sudo -- rm -- *
, it's okay to expand *
to -hello because there's a "--" just before the *
. This is different from sudo -- rm *
, where the argument preceding *
is rm
(which is not --
); in that case you do have a dash-expansion issue.
If the goal is to have maximally same behavior, here's one way:
The one weirdness is tar, but it is resolved if you use "--", and at least the file is archived instead of being misinterpreted as an option.
Hm I feel like this has gotten too complicated to resolve without some code... The exceptions keep growing exceptions. (e.g. the rules you propose don't seem consistent with the a=(*)
context)
I would take suggestions in the form of executable test cases... Most of them are here:
https://www.oilshell.org/release/0.8.pre1/test/spec.wwz/glob.html
related code: https://github.com/oilshell/oil/blob/master/osh/glob_.py
I would like to do something about this but I don't know what right now. (I would have done it awhile ago if it were easy.)
Fundamentally, the issue is that the shell doesn't understand the syntax of every command it executes. All of these assumptions are bad IMO:
--
is acceptedsudo rm
, chroot rm
, etc.)./foo.py
always means the same thing as foo.py
-- it doesn't in the case of tar
(the file being archived isn't my concern; it's that it's archived under a different name)bin/osh
and dash
or bash
should not produce different results)Although another dimension I considered is perhaps rm *
is different in the interactive shell. The theory being that a linter like ShellCheck can find the bad usages in shell scripts, but interactive use can be trapped by Oil. Of course that also has the downside of being complicated to explain.
Very different solution. Bash already has this var:
GLOBIGNORE A colon-separated list of patterns defining the set of filenames to be ignored by pathname expansion. If a filename matched by a pathname expansion pattern also matches one of the patterns in GLOBIGNORE, it is removed from the list of matches.
So it's possible for shopt -s oil:basic
to set GLOBIGNORE=-*
by default.
Simply ignoring the file means the rules can be simpler:
a=(*)
context, the for x in *
context, etc. --
../foo
can be wrong. This is arguable but I think omitting the file from a tarfile is more understandable than putting it under a different but almost indistinguishable name.Another possibility is GLOBFAIL=-*
, and they're not mutually exclusive of course.
UI issue: what if the user already set GLOBIGNORE? I guess that will override it, and the user can add it back like -*:.gitignore
.
There's also shopt -s dotglob
in bash ...
The bash wiki suggests that GLOBIGNORE=.:..
is an idiom (how common I don't know)
But GLOBIGNORE='-*:.:..'
is also fine (if unreadable). Actually this would be a good place to allow an array in Oil, similar to PATH (which isn't done but could be)
https://mywiki.wooledge.org/glob
The Bash variable (not shopt) GLOBIGNORE allows you to specify patterns a glob should not match. This lets you work around the infamous "I want to match all of my dot files, but not . or .." problem:
OK, thinking about this more, I have a concrete proposal. Looking at the comments in osh/glob_.py
, I was reminded that bash also has shopt -s dotglob
, which adds the dotfiles back:
$ shopt -s dotglob; echo *
.adobe .ash_history .audacity-data
...
So I propose that Oil has an option shopt -s no_dash_glob
that removes any result of a glob that starts with -
. That's it -- no conditions, just like dotglob
.
This option is part of the group shopt -s oil:basic
which is basically the your first foray into Oil. (oil:all
implements more stuff at the expense of breaking things.)
Benefits:
GLOBIGNORE
and dotglob
Thoughts?
Also: scripts which use ./
will get the ./-rf
files without problem, because the arg doesn't start with -
. So that's working as intended.
rm -- *
will omit the -
files. It can be temporarily disabled with:
shopt -u no_glob_dash {
rm -- */
}
echo * # restored
Because in Oil a block can be passed to shopt.
(Although I honestly hate the double negatives, so I'm thinking of having glob_dash
and the default is on, even though bash doesn't have that mechanism.)
Hm bash does have some on by default:
$ bash --norc -c 'shopt -p'
shopt -u autocd
shopt -u cdable_vars
shopt -u cdspell
shopt -u checkhash
shopt -u checkjobs
shopt -u checkwinsize
shopt -s cmdhist
shopt -u compat31
shopt -u compat32
shopt -u compat40
shopt -u compat41
shopt -u compat42
shopt -s complete_fullquote
So I should probably have a mechanism to flip the bit in option groups. OK.
So I propose that Oil has an option shopt -s no_dash_glob that removes any result of a glob that starts with -. That's it -- no conditions, just like dotglob.
That solves the main security problem. It's not as "nice" handling leading-dash filenames with an automatic "./", but such filenames are not nice to start with :-). And I agree that trying to handle "--" specially creates a lot of magic.
You could call the option "skip_dash_glob" or "glob_skip_dash" or something else with "skip" to avoid the double-negative in the option name.
Please post when you implement it. I will add that information to my essay on handling filenames in shell:
https://dwheeler.com/essays/filenames-in-shell.html
It seems to me that Oil / OSH has the potential of making filename handling much easier than the essay describes :-).
OK I implemented and documented it! That was easy.
I will write a blog post tagged #real-problems as mentioned in my latest post:
http://www.oilshell.org/blog/2020/02/recap.html
http://www.oilshell.org/blog/tags.html?tag=real-problems#real-problems
I'll ping this issue when published and then you can link it in your post.
By the way I linked your post in the top comment in the discussion that led to this:
https://lobste.rs/s/mm99iz/beware_shell_globs
I've seen it around many times over the years! And I'm sure it inspired Oil to some extent.
Oil is definitely meant to solve this problem "once and for all". I would love for some more feedback -- I've already gotten a lot from shell experts and am looking for more.
I haven't even documented the biggest simplification and most powerful change IMO -- shopt -s simple_word_eval
and arrays.
As we all know, shell's word evaluation algorithm is extraordinarily complicated, and POSIX doesn't even describe all the things that shells agree on.
https://github.com/oilshell/oil/wiki/OSH-Word-Evaluation-Algorithm
Oil is based on proper arrays -- when shopt -s simple_word_eval
is on, there is no field splitting or "empty elision".
All of this stuff should work "as expected", where @
is splicing of an array:
var myarray = @(xx 'with spaces' *.py '' {foo,bar}@example.com )
write -- @myarray *.sh # like echo, but accepts -- and other flags
Anyway let me know if you want to try this stuff out and give feedback. It will help me write the docs.
The goal is definitely to treat filenames as untrusted data. I wrote a bunch of "shell: the good parts" and "shell: the bad parts" posts. And I just wrote a draft of "The Worst Part of Shell Is Confusing Code and Data", and confusion over filenames is part of that. Is -rf
code or is it data?
Thanks!! This is an improvement.
I've modified my essay here: https://dwheeler.com/essays/filenames-in-shell.html to specifically recommend disabling returning filenames with leading dashes in globs, and noting how to do that in bash and oil. I already recommend using "./" prefixes for all globs; this just reduces damage when something forgets to follow that advice.
I agree with you that splitting is just a "poor man's array handler". I think replacing that with real array handling is the key to improving the shell language. I've heard of oil before, but I've started to read more seriously. I'll note that zsh has an option SH_WORD_SPLIT so you can disable word splitting there.
I'm hoping that your work will eventually lead to some set of shell options that will be widely implemented by many shells (e.g., bash, ksh, zsh, and maybe even dash). So I think your work in "trying things out" and then carefully documenting them is really important. I wish you luck.
I just renamed it to dashglob
to be more consistent with shopt -s dotglob
in dash. Bash sometimes uses names_with_underscores
but I decided that these options are so close in semantics that they should have consistent syntax.
And YES I think other shells should implement it. That's another reason I renamed it.
I also think other shells should implement shopt -s simple_word_eval
. In fact it was originally called oil_word_eval
but I changed it specifically so it could be adopted elsewhere.
I think zsh is close though I haven't looked into the details.
What that does is basically remove the "staging" of word evaluation -- e.g. splitting depends on expansion, and globbing depends on splitting.
That's basically the split+glob
thing here:
https://github.com/oilshell/oil/wiki/OSH-Word-Evaluation-Algorithm
It also removes "empty elision" which IFS=''
does not do. So yeah these are all things I would like feedback on from shell experts... I will try to document it more precisely soon.
Yes, I'm particularly interested in your work so that (1) quoting isn't needed and (2) safely handling filenames is far easier. So simple_word_eval
looks promising, but in practice that will require arrays (as you've noted). If such improvements could be widely adopted I think it'd be a big improvement.
OK great, I'll write some docs about it and then you can review it. I guess the one sentence summary is that simple_word_eval
gets rid of all the stages of expansion:
https://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_06
Instead there's only one "stage": evaluation. It works just like a Python arithmetic evaluator, but on strings. (not like a shell arithmetic evaluator, which has dynamic parsing, i.e. confusing code and data)
It's technically incompatible but I think it will break very few scripts.
Oil has a new array syntax, but it also supports a bash compatible syntax:
local myarray=(one two *.py) # bash
var myarray = @(one two *.py) # Oil style
If you haven't seen it, some color on the syntax: http://www.oilshell.org/blog/2020/01/simplest-explanation.html
Also in Oil there's no reason to use anything but "$@"
and ${array[@]}
. All the other forms like $*
can be disallowed, because if you want to join to a string, you can just write:
joined_str="$@"
Another simple way to describe it is that:
ls $x $y @ARGV @myarray *.py {a,b} ''
in Oil is just like:
ls "$x" "$y" "$@" ${myarray[@]}" *.py {a,b} ''
in bash.
The "arity" of the command is more static than dynamic, because there's no splitting. Splitting can be done with @split(mystr)
and dynamic globbing could be @glob(myglobstr)
(not implemented)
Also, the three contexts where this applies are the EvalWordSequence
ones: command, array literal, and for loop.
(I will copy all this into a doc)
Published the doc:
http://www.oilshell.org/preview/doc/simple-word-eval.html
Feedback is appreciated! I think Zulip is better than continuing this issue.
@mgree might also be interested (his paper about shell was in POPL: https://popl20.sigplan.org/details/POPL-2020-Research-Papers/29/Executable-Formal-Semantics-for-the-POSIX-Shell )
Let me know if anything is confusing and I'll update the doc.
I've posted a number of comment there. Quick highlights:
Released awhile ago: https://www.oilshell.org/release/0.8.pre2/
From discussion here:
https://lobste.rs/s/mm99iz/beware_shell_globs
-
would be disallowed--
? That might be a little too smart? Is it possible that--
is an arg and not a flag? Likemycommand --arg -- *
Maybe it has to be an unquoted
--
to turn it off? So you could domycommand --arg '--'
and that doesn't count?oil:basic
(andoil:all
?)