swiftlang / swift-syntax

A set of Swift libraries for parsing, inspecting, generating, and transforming Swift source code.
Apache License 2.0
3.13k stars 393 forks source link

[SwiftLexicalScopes][GSoC] Add simple scope queries and initial name lookup API structure #2696

Closed MAJKFL closed 2 weeks ago

MAJKFL commented 1 month ago

This is an initial PR for the SwiftLexicalScopes, developed as one of the GSoC 2024 projects.

In this PR:

This is my first contribution to the project, so I would greatly appreciate if someone could double-check everything. Thank you!

CodaFi commented 1 month ago

I think that this API would emphasize syntax over scope and add entrypoints as such.

My suggestion would be

protocol ScopeSyntax: SyntaxProtocol {
  func parent() -> (any ScopeSyntax)?
  func lookup(_ name: String) -> [DeclSyntax]
}

extension SourceFileSyntax: ScopeSyntax { /**/ }
extension CodeBlockSyntax: ScopeSyntax { /**/ }
/*etc*/

Internally, we can project a Scope tree over the syntax node and perform the lookups against that tree. But I'm not sure it's a good idea to expose the scope tree to clients.

MAJKFL commented 1 month ago

I think that this API would emphasize syntax over scope and add entrypoints as such.

My suggestion would be

protocol ScopeSyntax: SyntaxProtocol {
  func parent() -> (any ScopeSyntax)?
  func lookup(_ name: String) -> [DeclSyntax]
}

extension SourceFileSyntax: ScopeSyntax { /**/ }
extension CodeBlockSyntax: ScopeSyntax { /**/ }
/*etc*/

Internally, we can project a Scope tree over the syntax node and perform the lookups against that tree. But I'm not sure it's a good idea to expose the scope tree to clients.

I think it would be a great idea to make syntax in the center of those APIs, but I’m not entirely sure how would this work with more complex scopes like with function declarations. If we’d like to project a scope onto the FunctionDeclSyntax we’d have to include some more complex logic handling name lookup for e.g. generic parameters.

Compiler right now for this invalid snippet:

func foo<T1, T2: T1>(a: T1, b: T2) {
    print(a, b)
}

Produces this scope map:

***Complete scope map***
ASTSourceFileScope 0x14d13d000, [1:1 - 4:1] 'test.swift'
`-AbstractFunctionDeclScope 0x14d170020, [1:1 - 3:1] 'foo(a:b:)'
  `-GenericParamScope 0x14d139378, [1:1 - 3:1] param 0 'T1'
    `-GenericParamScope 0x14d139378, [1:1 - 3:1] param 1 'T2 : T1'
      |-ParameterListScope 0x14d139580, [1:21 - 1:34]
      `-FunctionBodyScope 0x14d170020, [1:36 - 3:1]
        `-BraceStmtScope 0x14d1701e8, [1:36 - 3:1]

The parameter T1 should be recognized during lookup as the first generic parameter.

I assume that’s why there are separate AbstractFunctionDeclScope and FunctionBodyScope so there’s a clear path for recursive lookup i.e. BraceStmtScope -> FunctionBodyScope -> GenericParamScope -> GenericParamScope -> AbstractFunctionScope -> …. If we’d consolidate this functionality into one SyntaxScope this would require some complex logic for handling lookup at generic parameters differently than in e.g. function body. I started prototyping how a structure resembling the compiler implementation could work in this case with inner (FunctionBodyScope) and outer (AbstractFunctionDeclScope) scopes both associated to FunctionDeclSyntax here.

What about a hybrid approach? We could still use the custom structure mapped over the existing syntax nodes internally and additionally introduce a new ScopeSyntax protocol that would add entry points for the API at the nodes themselves (that is move the functionality of LexicalScopes into the syntax nodes). We would retain the flexible scope structure and still emphasize syntax nodes as the entry points for the API.

It’s possible I’m wrong and I missed some detail so I’d appreciate to hear your take on this. I’d also like to hear from @DougGregor what he thinks about it.

CodaFi commented 3 weeks ago

I assume that’s why there are separate AbstractFunctionDeclScope and FunctionBodyScope so there’s a clear path for recursive lookup i.e. BraceStmtScope -> FunctionBodyScope -> GenericParamScope -> GenericParamScope -> AbstractFunctionScope ->

This is rather because ASTScope is designed in a decidedly much more object-oriented fashion than the rest of the compiler. I consider this to be an abstraction leak of the tree, but the structure of the model is nonetheless sound to your point. My preference is still for a Syntax-oriented API.*

What about a hybrid approach?

I agree! Consider the following:

struct Scope {
  var parent: Scope?
  /* etc. */
}

protocol ScopeSyntax: SyntaxProtocol {
  var scope: Scope { get } // project this node into a `Scope`
  func lookup(_ name: String, in scope: Scope) -> [DeclSyntax]
}

extension ScopeSyntax {
  func lookup(_ name: String) -> [DeclSyntax] {
    self.lookup(name, in: self.scope)
  }
}

That would handle a client that e.g. wants to perform unqualified lookup for a generic parameter by

  1. Walking from the FunctionDeclSyntax node to the GenericParameterListSyntax node
  2. Projecting its generic parameter list
  3. Calling functionDecl.lookup("x", in: genericParameterList.scope)

As for capturing lookup from a particular part of the generic parameter list, you could implement the projection of the syntax for a generic parameter to a Scope by storing an index into the enclosing generic parameter list and the generic parameter list itself.

*To recover the behavioral dispatch points that the object orientation is really needed for, we just need to have Scopes carry a Syntax node. Then you can branch on the kind. If you want to get fancy about it, you can create a sub-enum capturing all of the scope-aware syntax nodes, store that as a field, and branch on that.

CodaFi commented 3 weeks ago

Just a note while this is on my mind: We also have to be careful here to look at ASTScope as a model because its projection function is top-down (root to leaf), where as a Scope is (presumably) projected bottom-up (leaf to root). There are interesting things to consider in the bottom-up case like

guard let x = foo() else {
  return bar
}

Projecting the scope of the else brace statement has to be careful to walk through the scopes defined in its binding condition list and demand the scope of the next-most syntactic enclosing scope. Another way to put it - you could implement bottom-up scope projection by linking the else branch to its binders and marking them as "holes" that a recursive lookup procedure should merely traverse but not collect declarations from.

CodaFi commented 3 weeks ago

Another passing thought: If a Scope is sufficiently opaque, you can avoid allocating a Scope tree at all/modeling it with separate class references by having the parent node point to a ScopeSyntax. A Scope would then just be a value-typed handle returned as the result of projecting up the ScopeSyntax parts of one particular branching path of the syntax tree. Heck, you could even have the Scope value carry the set of local declarations along with its parent pointer. The implementation of lookup falls out neatly as a tree walk that joins results from each Scope it encounters.

MAJKFL commented 3 weeks ago

Another passing thought: If a Scope is sufficiently opaque, you can avoid allocating a Scope tree at all/modeling it with separate class references by having the parent node point to a ScopeSyntax.

I tried remodeling the name lookup with shadowing for function declaration with just ScopeSyntax nodes and I think it could be a good idea to follow this lead. Specifying lookup caller fixes the problem with name lookup at generic parameters and function parameter list. Additionally, it could also potentially serve as a good starting point for lookup in CodeBlockSyntax to filter the declarations according to their relative position.

As for capturing lookup from a particular part of the generic parameter list, you could implement the projection of the syntax for a generic parameter to a Scope by storing an index into the enclosing generic parameter list and the generic parameter list itself.

I tried a bit different approach with generic parameter scopes. I followed the hierarchical structure of ASTScope to make the API more uniform across the lookup and not worry about keeping the indexes of generic parameters. This way, each GenericParameterSyntax implements ScopeSyntax. On the other hand, it complicates a bit more the lookup on the FunctionDeclSyntax side, but I think it’s a great tradeoff.

Also, as for now, I’m just returning the values as TokenSyntax nodes. Since the lookup could also result in variable declarations, function parameters, function name, generic parameters etc. Is there a way to represent them uniformly as DeclSyntax as you suggested?

I’d love to hear your feedback @CodaFi about the adjusted structure. I tried to follow your suggestions with these two adjustments, so I hope I haven’t missed anything.

DougGregor commented 3 weeks ago

@MAJKFL looks like Package.swift has a conflict. If you're able to resolve that and run formatting locally (if you haven't already), I think we're ready to kick off testing

DougGregor commented 3 weeks ago

@swift-ci please test

DougGregor commented 3 weeks ago

@swift-ci please test

DougGregor commented 3 weeks ago

@swift-ci please test

DougGregor commented 3 weeks ago

@swift-ci please test Windows

DougGregor commented 3 weeks ago

macOS failure is this:

Error: .spi.yml did not contain the same libraries as Package.swift:
 - Missing in .spi.yml: SwiftLexicalLookup
DougGregor commented 3 weeks ago

@swift-ci please test

MAJKFL commented 3 weeks ago

macOS failure is this:

Error: .spi.yml did not contain the same libraries as Package.swift:
 - Missing in .spi.yml: SwiftLexicalLookup

I added SwiftLexicalLookup to the .spi.yml file. I hope it's fixed now.

DougGregor commented 3 weeks ago

@swift-ci please test

DougGregor commented 3 weeks ago

That looks right to me. Kicked off CI and let's hope for the green checks

DougGregor commented 3 weeks ago

Feel free to merge if it passes

DougGregor commented 3 weeks ago

@swift-ci please test

DougGregor commented 3 weeks ago

@swift-ci please test Windows

MAJKFL commented 2 weeks ago

Is it ok if we postpone the formatting changes to the next PR to not trigger CI for a couple of comments? I'll remember to include them. If you think the PR is good enough, could you please merge it? I see I would need write access for this repo to do so.

DougGregor commented 2 weeks ago

I'll merge and we can pick up the formatting tweaks in another PR