partiql / partiql-ir-generator

PartiQL I.R. Generator (P.I.G.)
Apache License 2.0
24 stars 7 forks source link

Parameterized return type in generated visitors #123

Open RCHowell opened 2 years ago

RCHowell commented 2 years ago

For transformations, it is useful to have a visitors return nodes. This issue is a feature request to add a parameterized return type to the generated visitors. Below is what the VisitorFold might look like. Fold naming is a bit confusing vs like ContextualVisitor or VisitorWithContext

public abstract class Visitor<R, C> {

    public R visit(Node node, C context): R? { }

}
dlurton commented 2 years ago

We already have Converter<T> interfaces. Every type domain gets one. It's not exactly the same since it doesn't have a context parameter on its convert methods, but is otherwise functionally the same as this. Example: ToyLang's Converter<T>

A field in the class implementing a Converter<T> can be utilized instead of a context parameter. The field can be mutable to change the context or you can new a new instance of the Converter<T> implementation with a different context and recurse into that. Altho, I can see the value of passing the context as a separate parameter, perhaps a variation of Converter<T> could be added, ContextualConverter<R, C> or some such.

RCHowell commented 2 years ago

Edit: A Converter is not a Visitor; A Converter is generated for each Sum type, so it does not serve the same purpose as Visitor

dlurton commented 2 years ago

The original visitors predate the Converter<T> interfaces and were intended to perform semantic checks only. Additionally, with Conveter<T>, more work is required on behalf of the implementer because it doesn't traverse child nodes for you, but not so with the Vistior (see the related Walker classes). So, given this, it is not true to say that a PIG visitor is the same as Visitor<Unit>.

dlurton commented 2 years ago

Although, you could I guess create a Walker that traverses a Converter<Unit>. Perhaps a task should be undertaken to do this.

dlurton commented 2 years ago

Potentially, also terminology could be revisited. i.e. drop the existing Visitor classes and rename/refactor Converter<T> to Visitor<R, C>. I am not opposed to breaking API changes myself if they need to happen.

RCHowell commented 2 years ago

Are you interested in keep all three "visitor" types (Visitor, VisitorFold, Converter)?

As a new person looking in, I see that all three of these are covered by Visitor<R, C>

RCHowell commented 2 years ago

I see how Converter is an interface, so you have to implement them all. Was that intentional to enforce? If I were writing a converter for say "uppercase all identifiers", then I wouldn't want to implement everything rather just override some defaultVisit identity function. For example,

public abstract class Visitor<R, C> {

    public R visit(Node node, C context): R? = defaultVisit(node, context)

    // ........

    public R defaultVisit(Node node, C context): R? = visitChildren(node, context)

    public R visitChildren(Node node, C context): R? {
      for (Node child: node.children) {
          child.accept(this, context)
      }
      return null
    }

}

public Uppercaser : Visitor<Node, Unit> {

   @override
    public visit(IdentifierNode node, Unit context): Node? = IdentifierNode(
      identifier = node.identifier.uppercase()
    )

   @override
   public defaultVIsit(Node node, Unit context): Node? = node

}
dlurton commented 2 years ago

In principle, I would not be opposed to combining Visitor, VisitorFold and Converter. You will have to weigh the cost of breaking API changes and updating the PartiQL codebase and customers, and that is not up to me.

There is a 4th kind--VisitorTransform which you should be aware of which is not directly replaceable by a Visitor<R, C>. Terminology perhaps could be revised here and VisitorTransforms could be made to implement a Visitor<R, C>. VisitorTransform contains default implementations of the visit* methods which traverse & clone the current node only if a child has been changed.

dlurton commented 2 years ago

I see how Converter is an interface, so you have to implement them all. Was that intentional to enforce?

Yes. This works well in a number of scenarios. For example, converting partiql_ast back to parsable SQL. The idea is to make sure the implementer has covered all nodes, and to enlist the compiler's help in enforcing that. This avoids bugs due to inadvertent omission, or if new nodes are added.

The style of visitor you list there requires that all child nodes are contained within a children collection, which PIG does not generate. Instead, each child node is a discrete property of the class for the node, which has the advantage that access of the child nodes is strongly typed and IDE assisted (code-completion, etc), but has the disadvantage that traversal is more complicated. PIG being a code generator was designed to mitigate that, in part by generating these visitor classes.

Your visitor there is a use case for VisitorTransform, i.e.

class UppercaseTransform : FooLang.VisitorTransform() {
    fun transformExprIdentifier(node: FooLang.Expr.Identifier) = node.copy(name = node.name.toUppercase())
}

Invoking this visitor returns a new tree, with nodes from the original tree re-used if there were no changes in them. i.e. only the nodes which are changed, and their parent nodes are cloned.

dlurton commented 2 years ago

@RCHowell If you do make changes to these generated visitors, please make sure to include me in any design doc reviews / PRs, etc. I am the original author of PIG but I am now its customer and much of the work I am doing now with PartiQL's query planner is heavily dependent on the VisitorTransforms.

RCHowell commented 2 years ago

Understood. I am new to the team and was taking a look at this package and saw some potential to consolidate classes and logic. It's not prioritized now.

RCHowell commented 2 years ago

The style of visitor you list there requires that all child nodes are contained within a children collection, which PIG does not generate. Instead, each child node is a discrete property of the class for the node, which has the advantage that access of the child nodes is strongly typed and IDE assisted (code-completion, etc), but has the disadvantage that traversal is more complicated.

You can do both, but since accept was missed, then the traversal became complex. Child nodes remain fields of the classes, but the children collective still exists. Then traversal is trivial with an accept.

interface BaseDomainNode {
  fun accept(visitor: BaseVisitor)
}

/ / BaseVisitor uses `walk` as the defaultVisit method
fun walk(node: BaseDomainNode) {
  node.children.forEach { it.accept(this) }
}