gracelang / language

Design of the Grace language and its libraries
GNU General Public License v2.0
6 stars 1 forks source link

Dialects should built their own AST #133

Open kjx opened 6 years ago

kjx commented 6 years ago

where should the AST used by dialect checkers come from?
what if some dialect (static type checker) need more information or methods on AST objects, others (findbugs?) will need less information, and some (static checker?) would need a crazy amount.

my suggestion is that a dialect should be (or should offer) an AST factory (aka Object Algebra) - a suite of methods to build an AST. When a module using a checker dialect is instantiated, that factory is used to build an AST with the methods on it that that particular checker needs. Something like:

type astAlgebra = interface {
     numberLiteral( n ) at ( position )
     stringLiteral( s) at ( position )
     blockLiteral ( parameters ) over ( body ) returning ( type ) at ( position ) 
     receiver (rcvr) Request ( name) args (args)  at ( position ) 
     implicitRequest (name) args (args) at ( position ) 
}

the compiler can walk its internal AST and call out to classes in the factory.

apblack commented 6 years ago

We do need to do something about pinning down how dialects work in an implementation-independent manner. Thanks for opening this issue.

The problem that you identify is that dialects are quite diverse. Some need a parse tree, not an AST, because they are restricting the language and need to produce precise syntax error messages. Others need, for example, the full symbol table information, including where each identifier is declared and its type.

Your proposal seems to be suggesting that we settle on a lowest-common-denominator parse tree that is just a representation of a parse of the source, with no inferred information. That is adequate, but makes building dialects much more difficult, because all of the work done by the compiler in, e.g., resolving applied occurrences of identifiers back to their defining occurrences, and in supplying omitted selfs and outers, will have to be repeated by the dialect writer. That's an unreasonable burden.

The key issue, to me, is deciding on what information the dialect writer will be able to request from the tree. For example, can the dialect ask if a method is public, or can it only ask what annotations are available on the declaration and then have to implement the visibility rules itself? Can it ask the type of a variable application, or does it have to ask for the type annotation of the declaration, or does it have to find the declaration?

I don't see how switching from a visitor of AST nodes to a constructor of AST nodes helps us to answer these questions. Unless you are going to say: all the dialect gets is raw syntax, and the dialect writer is on her own in performing all analyses. That puts too much work on the path of writing a dialect. One of the biggest wins of Grace to me as an instructor is that writing a fairly sophisticated dialect is easy.

kjx commented 6 years ago

We do need to do something about pinning down how dialects work in an implementation independent manner. Thanks for opening this issue.

well yeah, at least to a canonical ASST/tree/etc

The problem that you identify is that dialects are quite diverse. Some need a parse tree, not an AST, because they are restricting the language and need to produce precise syntax error messages. Others need, for example, the full symbol table information, including where each identifier is declared and its type.

sort of.

Your proposal seems to be suggesting that we settle on a lowest-common-denominator parse tree that is just a representation of a parse of the source, with no inferred information.

not quite: more a lowest-common-denominator interface between the parser and the dialect.

That's an unreasonable burden.

dunno. how tightly do you want to couple things? a lowest-common-denominator syntax can always have libraries to do the heavy lifting.

The key issue, to me, is deciding on what information the dialect writer will be able to request from the tree...

part of this is just API design - and it's really not obvious whether a minimalist or a maximalist approach is useful. the more detailed and maximalist the API of the AST, the more we end up with only one implementation. letting each dialect build its own AST from the parser means we can get a common interface without necessarily having to rewrite existing parsers, translators, interpreters to conform to One True AST.

J

KimBruce commented 6 years ago

I agree this is a critical issue. We need dialects to work with all “compliant” implementations (for some definition of compliant). Right now, what we need is a common interface to the parse-tree and AST (it does seem helpful to be able to access both). In the unlikely case that we actually start caring about execution speed, we might also need to allow dialects to actually modify these trees (e.g., remove dynamic checks when the dialect verifies they are not needed).

Ideally, implementation would not have to completely rebuild an AST for the checker, but instead would just have an interface that provides the information needed (inexpensively) in an implementation-independent way.

Kim

On Nov 21, 2017, at 3:36 AM, kjx notifications@github.com wrote:

We do need to do something about pinning down how dialects work in an implementation independent manner. Thanks for opening this issue.

well yeah, at least to a canonical ASST/tree/etc

The problem that you identify is that dialects are quite diverse. Some need a parse tree, not an AST, because they are restricting the language and need to produce precise syntax error messages. Others need, for example, the full symbol table information, including where each identifier is declared and its type.

sort of.

Your proposal seems to be suggesting that we settle on a lowest-common-denominator parse tree that is just a representation of a parse of the source, with no inferred information.

not quite: more a lowest-common-denominator interface between the parser and the dialect.

That's an unreasonable burden.

dunno. how tightly do you want to couple things? a lowest-common-denominator syntax can always have libraries to do the heavy lifting.

The key issue, to me, is deciding on what information the dialect writer will be able to request from the tree...

part of this is just API design - and it's really not obvious whether a minimalist or a maximalist approach is useful. the more detailed and maximalist the API of the AST, the more we end up with only one implementation. letting each dialect build its own AST from the parser means we can get a common interface without necessarily having to rewrite existing parsers, translators, interpreters to conform to One True AST.

J — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gracelang/language/issues/133#issuecomment-346000605, or mute the thread https://github.com/notifications/unsubscribe-auth/ABuh-k3AjTQjieIZnn8bUChHtke-1lbJks5s4rXHgaJpZM4Qld6R.

kjx commented 6 years ago

On 22/11/2017, at 7:52AM, Kim Bruce notifications@github.com wrote:

I agree this is a critical issue. We need dialects to work with all “compliant” implementations (for some definition of compliant).

the underlying question is, actually, do we "really" want to have more than one implementation.

Right now, what we need is a common interface to the parse-tree

Or at least a common interface to dialects to build their own versions.

the advantage of the common factory interface is that the interface is smaller

and AST (it does seem helpful to be able to access both).

Hmm. I think we should avoid the parse tree if at all possible...

In the unlikely case that we actually start caring about execution speed,

run on Moth.

we might also need to allow dialects to actually modify these trees (e.g., remove dynamic checks when the dialect verifies they are not needed).

edits lead to Macros. Macros lead to LISP. LISP leads to the Dark Side.

At that point, correctness of the implementation now depends on dialects. In fact, even if the actual AST used by the compiler is handed into the dialect is mutable then dialects can break the compiler even if we don't want them to.

Ideally, implementation would not have to completely rebuild an AST for the checker, but instead would just have an interface that provides the information needed (inexpensively) in an implementation-independent way.

well that's what I thought at first - but that means we have to standardise on the entire AST - not just what the nodes are (and how code is transformed in to the AST) but also the interfaces of each of the AST nodes.

Given that the AST is pretty much the core of the compiler, that pretty much comes down to just having one implementation. going with a factory (basically an algebraic data type) reduces that interface down to just the node constructors.

James

apblack commented 6 years ago

Thinking (because of SmallGrace) about how a compiler written in a different language world interface to a dialect written in Grace, it seems that some sort of object algebra interface is the right solution. Indeed, it may be the only solution, if by “object algebra” we mean some sort of procedural interface. There would then be a Grace module that provides, in Grace, an implementation of the AST, but dialect writers would not be required to use this, if they didn’t want or need to examine the whole AST.

For example, consider my dialect that enforces certain restrictions in method comments. It doesn’t need an AST; all it needs to do is provide empty implementations for all of the AST-building methods except for methodHeader; inside that method it performs its checks.

The second question that James was raising is what the interface to these AST-building methods should be: minimal, or maximal, or somewhere in between? It seems clear to me that we are not going to resolve that question. Since we do have multiple implementations — that’s not a question, but a fact — they are going to have different amounts of information available. The way forward, it seems to me, is to allow the compiler to provide what it has, and to provide an unavailable object otherwise. As James said, we would then have a Grace library available to the dialect writer that would compute the stuff that was unavailable, on demand.

I’m not quite sure how to implement all of this — I guess that the details will not become clear until it’s done. We would also need OA interfaces to everything that hangs from the AST, such as Symbol Tables and source-code ranges. A dialect also needs a way of raising errors.