mvdan / sh

A shell parser, formatter, and interpreter with bash support; includes shfmt
https://pkg.go.dev/mvdan.cc/sh/v3
BSD 3-Clause "New" or "Revised" License
7.14k stars 338 forks source link

Separate comment from sub-shell opening parenthesis with a space #842

Closed antichris closed 2 years ago

antichris commented 2 years ago

Also applies to sub-shell functions and command substitutions (including the deprecated legacy back-tick syntax).

Fixes #839.

antichris commented 2 years ago

Frankly, even though it gets the job done, this feels to me more like a dirty hack than a hard enforcement of the rule "a comment may only be directly preceded by white-space". That, I suspect, would require more extensive changes.

mvdan commented 2 years ago

The added tests look great, thank you! I agree that this solution in terms of code is relatively hacky :) I suspect that the problem is that wantSpace is set to false by open parentheses tokens, so that we don't print $(foo) as $( foo). We don't track whether we've just written a space character already, so then the printing of comments incorrectly understands wantSpace == false to mean that we already printed a space. So the same boolean field is being used for two conflicting purposes, which was fine for many cases, but not this one.

The best option might be to replace wantSpace bool with a wantSpace wantSpaceState enum type, like:

type wantSpaceState uint8

const (
    spaceNotRequired wantSpaceState = iota
    spaceWanted // we should generally print a space or a newline next
    spaceWritten // we have just written a space or newline
)

The added third state could be used by flushComments; it would print the space unless we're in spaceWritten.

I understand if this is too much work for you, but I thought I'd explain my thinking before I dig into a solution myself, in case you were interested :)

antichris commented 2 years ago
I'm pretty sure this is not really for a this issue and/or PR, but... 1. `spacePad()` https://github.com/mvdan/sh/blob/ba7908ece540171e3bfd3a6f7f13f7fd2591afaa/syntax/printer.go#L295-L298 could probably use `p.space()` instead: ```go if p.wantSpace { // if p.wantSpace == spaceRequired { p.space() } ``` 2. `space()` itself could be implemented in terms of `spaces()`: ```diff func (p *Printer) spaces(n uint) { for i := uint(0); i < n; i++ { p.WriteByte(' ') } + p.wantSpace = false // p.wantSpace = spaceWritten } func (p *Printer) space() { - p.WriteByte(' ') - p.wantSpace = false // p.wantSpace = spaceWritten + p.spaces(1) } ``` 3. the padding code of `spacePad()` https://github.com/mvdan/sh/blob/ba7908ece540171e3bfd3a6f7f13f7fd2591afaa/syntax/printer.go#L299-L301 could also be rewritten to use `spaces()`: ```go if p.cols.column > 0 { padding := pos.Col() - uint(p.cols.column) if padding < pos.Col() { p.spaces(padding) } } ``` although this one doesn't really feel like *that* much of an improvement to me.

I'm not sure if it warrants opening an issue, but I felt like it's worth mentioning.

Also, there's something funky going on with sub-shell trailing comments. For example: ```sh ( #foo bar #qux ) #corge ``` gets turned into ```sh ( #foo bar #qux ) #corge ``` With `keepPadding` such an input passes unchanged. Almost as if the meaning of `keepPadding` is inverted for those trailing comments. I'm guessing this will probably go away once #658 gets dealt with.
mvdan commented 2 years ago

To answer in parts:

First: You're right that we could deduplicate bits of code by reusing function calls. From memory, I ended up with code like that due to performance optimizations. However, that was all the way back in 2016-2017, and it's likely that the compiler has gotten better in terms of inlining, for example.

Certainly happy to explore that, but probably in a separate PR, like you say.

Second: That sounds like a bug. Mind filing an issue so we don't forget?

As for the patch - thanks for the update! Looks like you put some serious effort into this :) I'll take a look.

mvdan commented 2 years ago

Squashed and tweaked your commit message a bit, hope that's okay.

antichris commented 2 years ago

Thanks, that's perfect — I would've suggested squashing as well. I usually try to stick to making one logical change per commit as I work, because it's tricky (when not impossible) to pry them apart later, if need be, but squashing them together when rebasing onto a main branch is always a breeze.