mvdan / sh

A shell parser, formatter, and interpreter with bash support; includes shfmt
https://pkg.go.dev/mvdan.cc/sh/v3
BSD 3-Clause "New" or "Revised" License
7.35k stars 346 forks source link

cmd/shfmt: support null byte separators in --list and --find #1096

Closed sdavids closed 1 week ago

sdavids commented 2 months ago

@msanders

https://github.com/mvdan/sh/issues/288#issuecomment-579582229

It might make sense to add a -z or -0 flag for shfmt -f to use safely with xargs -0. (See e.g. find -print0, git ls-files -z, grep --null, etc.)

How can I find and safely handle file names containing newlines, spaces or both?

Also for shfmt -l.


Adding --find0 and --list0 would not lead to ambiguous/erroneous cases like shfmt -w -0 ..

It also mirrors find's -print/-print0 and -fprint/fprint0.

mvdan commented 2 months ago

Thanks. How often does this realistically come up?

If we add this, I would rather not add new flags. We can tweak --find and --list so that they also accept an optional zero parameter like --find=0 and --list=0. I adapted an existing boolean flag to take optional string values too in a different project, you're free to copy paste the idea:

https://github.com/mvdan/xurls/blob/c5b6e994e5db07c3f691bbb8c044b34dd20720f8/cmd/xurls/main.go#L36-L44

sdavids commented 2 months ago

Thanks. How often does this realistically come up?

Every time one wants to use --find/--list with xargs or any other tool like sort.

sdavids commented 2 months ago
-l[=0], --list[=0]  list files whose formatting differs from shfmt's;
                    paths are separated by a newline or a null character if -l=0
                    (corresponding to the -0 option of xargs)
-f[=0], --find[=0]  recursively find all shell files and print the paths;
                    paths are separated by a newline or a null character if -l=0
                    (corresponding to the -0 option of xargs)

Your preference?

I am not sure, if that is more readable/discoverable than:

-l,  --list    list files whose formatting differs from shfmt's;
               paths are separated by a newline
-l0, --list0   list files whose formatting differs from shfmt's;
               paths are separated by a null character
               corresponding to the -0 option of xargs
-f, --find     recursively find all shell files and print the paths;
               paths are separated by a newline
-f0, --find0   recursively find all shell files and print the paths;
               paths are separated by a null character
               corresponding to the -0 option of xargs
-l, --list     list files whose formatting differs from shfmt's;
               paths are separated by a newline
-f, --find     recursively find all shell files and print the paths;
               paths are separated by a newline
-z             \0 line termination on -l and -f output

Precedent

https://www.man7.org/linux/man-pages/man1/find.1.html

If no expression is given, the expression -print is used (but you should probably consider using -print0 instead, anyway).

https://git-scm.com/docs/git-ls-files#Documentation/git-ls-files.txt--z

-z \0 line termination on output and do not quote filenames. See OUTPUT below for more information.

https://www.gnu.org/software/sed/manual/sed.html

-z Treat the input as a set of lines, each terminated by a zero byte (the ASCII ‘NUL’ character) instead of a newline. This option can be used with commands like ‘sort -z’ and ‘find -print0’ to process arbitrary file names.

mvdan commented 2 months ago

Every time one wants to use --find/--list with xargs or any other tool like sort.

You only need to null-separate arguments when they could contain newlines though. I have never in my life encountered any filenames containing newlines, so I would assume that shfmt -find . | xargs whatever works just fine with newline separators. I still don't oppose the feature, for the sake of better compatibility with other tools.

I am not sure, if that is more readable/discoverable than:

Yes, I would rather add the optional value to the two flags rather than duplicating the flags or adding a new flag which can only be combined with either of the other two. You could drop the mention of xargs in the help text for brevity.

Note that the -z flag you mention from other tools affects the entire tool. Here, our -z flag would not work on its own, so it would be a rather awkward flag.

sdavids commented 2 months ago

You only need to null-separate arguments when they could contain newlines though.

Maybe you wanted to say “spaces”?

I have never in my life encountered any filenames containing newlines,

That is because the file systems I know of do not allow it.


File names with spaces are quite common on the other hand.

macOS: ~/Library/Application Support

mvdan commented 2 months ago

Ah of course. For some reason I misremembered xargs to only split on newlines, but it does split on spaces too.

sdavids commented 2 months ago

As far as I can tell the flag package does not differentiate between

--flag=x and --flag x.

If you have a String value then you must have a value, i.e. --flag is not allowed anymore.

This is important because if l is String instead of Bool:

shfmt -l . is the same as shfmt -l=.


type stringFlag struct {
    set bool
    val string
}

func (sf *stringFlag) Set(s string) error {
    sf.val = s
        // TODO validate s
    sf.set = true
    return nil
}

func (sf *stringFlag) String() string {
    return sf.val
}

list        = &multiFlag[stringFlag]{"l", "list", stringFlag{false, ""}}

        case *multiFlag[stringFlag]:
            if name := f.short; name != "" {
                flag.StringVar(&f.val.val, name, f.val.val, "")
            }
            if name := f.long; name != "" {
                flag.StringVar(&f.val.val, name, f.val.val, "")
            }

With the above we would have list.val.val = "." for shfmt -l .

mvdan commented 2 months ago

See the snippet I shared earlier:

https://github.com/mvdan/xurls/blob/c5b6e994e5db07c3f691bbb8c044b34dd20720f8/cmd/xurls/main.go#L36-L44

In particular the presence of IsBoolFlag.

sdavids commented 2 months ago

Adding

func (*stringFlag) IsBoolFlag() bool { return true }

does not change a thing.

shfmt -l .list.val.val = "."

mvdan commented 2 months ago

Because you're using flag.StringVar rather than flag.Var; that takes Value, which is where IsBoolFlag may be present.

If you get stuck it's not a problem, I'll get to it at some point.

sdavids commented 2 months ago

It have it working now, but one side effect is:

./shfmt2 -l=true . is the same as ./shfmt2 -l ..

Should that be documented?

mvdan commented 2 months ago

That's already the case with all boolean flags in Go:

$ shfmt --space-redirects <<<'foo >bar'
foo > bar
$ shfmt --space-redirects=true <<<'foo >bar'
foo > bar

It's just not a very well known fact because it's not very useful in general.

sdavids commented 2 months ago

Do you want just one boolFlag or one listFlag and one findFlag.

Considering now both have the same allowed values, but that might change in the future.

func (s * boolFlag) Set(val string) error {
    if val != "true" && val != "0" {
        return fmt.Errorf("only \"0\" supported\n")
    }
    s.val = val
    s.set = true
    return nil
}

    list        = &multiFlag[boolFlag]{"l", "list", boolFlag{false, ""}}
        find        = &multiFlag[boolFlag]{"f", "find", boolFlag{false, ""}}

vs.

    list        = &multiFlag[listFlag]{"l", "list", listFlag{false, ""}}
        find        = &multiFlag[findFlag]{"f", "find", findFlag{false, ""}}
mvdan commented 2 months ago

Share the same type. FWIW a single string value is still enough as long as you set the default to "false" - then the only allowed defaults can be "false", "true", and "0". Note that -list calls Set("true") as it's a short alias for -list=true.

sdavids commented 2 months ago

I have updated the PR.