google / starlark-go

Starlark in Go: the Starlark configuration language, implemented in Go
BSD 3-Clause "New" or "Revised" License
2.31k stars 210 forks source link

Expose paren Position on DefStmt #504

Closed raphaelvigee closed 1 year ago

google-cla[bot] commented 1 year ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

adonovan commented 1 year ago

What's the motivation for this change? The syntax tree doesn't expose the positions of most tokens, as they are redundant given the formatting algorithm. Why are these token positions necessary, but not all the other ones?

raphaelvigee commented 1 year ago

The L/R paren are generally exposed (see Dict, List, Tuple) I'm writing a formatter, and needed access to those position to make certain decisions

adonovan commented 1 year ago

The L/R paren are generally exposed (see Dict, List, Tuple)

That's not really an accurate characterization of the current design. The syntax tree records positions only for tokens that are required to accurately compute the span (start and end positions) of the tree, as in your examples of Dict, List, and Tuple. It also records the positions of interior tokens that are semantically important, such as the positions of + (add) and ( (call) operators, which are needed in error messages. But it doesn't record positions of interior tokens that are trivial and can be reconstructed by a canonical formatter, such as commas. The paren of a DefStmt seems to fall into this category, as DefStmt already has two tokens (def, name) that suffice for error messages, and the paren is an interior token that a formatter canonically places immediately after the function name.

I'm writing a formatter, and needed access to those position to make certain decisions

For what decisions does your formatter need this information?

raphaelvigee commented 1 year ago

The syntax tree records positions only for tokens that are required to accurately compute the span

I would argue that the position of ) is required to correctly compute the span of the parameters, which is what i m interested in here

the paren is an interior token that a formatter canonically places immediately after the function name.

I agree with you for the (, but the ) can be on the same line as (, or a couple lines down, depending on the paremeters whether they are layed down in one line, or multiple.

For what decisions does your formatter need this information?

We gonna go a bit deep here:

Consider the following pieces of code:

def blah(
    arg1 = []
    # some comment
):
    pass

When parsing that code, the # some comment gets attached as a Before to pass, so i use the position of () and the position of the comment to hoist the comment back into the correct developer-intended place. The logic being that if a comment.Line is > (.Line and < ).Line, then its the last "arg" of the parameters

adonovan commented 1 year ago

Thanks for illuminating. I think you are pointing out a general deficiency of the current syntax tree, which is that it doesn't contain sufficient information to preserve the relative order of tokens and comments. (Go's go/ast has a similar problem, notoriously; see golang/go#20744). If we fixed it in this instance, I suspect you would still encounter similar problems elsewhere in the tree.

How many other places would we need to record token positions in order to preserve token/comment order in all "reasonable" cases? (An unreasonable comment would be one before a comma, say.)

raphaelvigee commented 1 year ago

Understood. I wrote a formatter that seems to work for all the cases of comments i can imagine (although i probably missed a bunch), the issue highlighted above happens for all L/R list-ish nodes (Dict, List, Tuple, Call), they all expose the paren, so i could implement the algorithm described above, only DefStmt didnt expose it. If you are curious to see that contraption

raphaelvigee commented 1 year ago

But I would say that fixing those deficiencies would be pretty straightforward: add to syntax.Comment another field: Last // used in Dict, List, Tuple, Call, stores arguments before the closing paren

adonovan commented 1 year ago

Alright, let's add these two tokens. I noticed that the most popular Python formatters appear to preserve the position of comments after the last parameter.

laurentlb commented 1 year ago

I'm writing a formatter, and needed access to those position to make certain decisions

Is there any reason for not using Buildifier? https://github.com/bazelbuild/buildtools/tree/master/buildifier

raphaelvigee commented 1 year ago

Mostly wanted single tool, also it was pretty fun to write (except the part where the comments end up in random places)