wasp-lang / wasp

The fastest way to develop full-stack web apps with React & Node.js.
https://wasp-lang.dev
MIT License
13.67k stars 1.18k forks source link

Make Wasp a proper language #109

Closed Martinsos closed 2 years ago

Martinsos commented 3 years ago

Right now, Wasp syntax is pretty ad-hoc -> we are using Parsec to customly parse each declaration, and we add stuff as we need it. We don't have separate lexical and semantical analysis, even token parsing is mixed with it all.

This ad-hoc nature of the language makes it harder to reason about it and to work with it -> for example, writing syntax highlighter is tricky, because they normally expect rules, and our rules are more complicated than usual. Also, it is harder for users to reason about the language when too many stuff is custom -> it is easier to have a set of rules that you then build upon, than having a ton of rules which you can't remember.

While I think this ad-hoc approach makes sense for now since we are developing the language and experimenting, but ideally we will want to switch to formal grammar. For example, we could define something called "declaration" and then all of them (page, action, query, ...) have to follow that. We should also have some basic expressions and control statements, to enable certain reusage of code and flexibility. Then there are properties in the each of declaration -> right now we parse those in custom manner, but instead we will probably want to define "object" and "primitive values" as concepts and then have something similar to JSON but we can expand it further if needed (we can call it WSON or smth). Then, we would parse those, and later in semantical step we would figure out if properties are of the correct type and similar.

I believe this will make it much easier to add new features / new declarations to language, and it will also be easier to write additional tooling for it, like syntax highlighting.

One language we might want to look upon is meson -> they have a declarative DSL with expressions, immutable values and control statements and declarations, actually very similar to Wasp. It is python-like, we will instead probably want to go with JS-like (C-like) style since that is closer to web devs.

If we are going to do this, I think we should start with ditching parsec and using Happy. Or, we could use parsec (megaparsec?), but have cleaner separation of lexical and semantical analysis, maybe even token analysis (although that might be handled by parsec?).

Martinsos commented 3 years ago

Nice article about configuration languages and difference between simple configuration language (JSON, YAML), turing complete language, and languages specifically designed for configuration (Skylark, Meson, Nix, Dhall): https://beepb00p.xyz/configs-suck.html .

Wasp is moving in direction of the last group right now. We could, in theory, instead of implementing Wasp as standalone langauge, use some of these (e.g. Dhall), or even say: hey build your JSON in JS and give us that. But let's see, I don't feel like giving up on creating our own language yet, it is harder but it could be really cool.

Martinsos commented 3 years ago

Well, this idea has just come one big step closer to reality with latest effort from @craigmc08!!

Quoting his post from our Discord server:

I've been working on a new design for the wasp config language, and it's in a place where I'm ready for others to look at it and give some feedback. I wrote it for issue #109, but it has some thoughts about issue #227 (about modules/plugins). Let me know what you think!

Here is the the proposal for new design that he attached with his Discord post: wasplang.pdf .

To summarize, he wrote a formal grammar for new version of Wasp DSL that is very similar to current DSL but it makes it more extensible and it should allow us to more easily extend and modify it in the future (as was the purpose of this github issue).

Further steps: @matijaSos and I will look into it and will continue the discussion both there and on Discord!

Super excited about this :).

Martinsos commented 3 years ago

@craigmc08 , I took a detailed look at the proposal, awesome work :)! Below are my initial thoughts, and I would l love to discuss them in order to further refine the proposal, to the level where we are ready to start implementing it. I enumerated all my comments to hopefully make it easier to follow the discussion later.

NOTE: There are parts of proposal where I lack knowledge, specifically type inference rules and haskell generics, and will need more of your guidance with. I would love to learn them and will put effort in it, so if you have any resources to recommend (especially on type inference rules), please do. I found https://www.cs.utexas.edu/~tdillig/cs345H/ and I see it has portion with type inference so I will probably go with that if I don't find anything better.

2.1 Syntax

  1. Is there an official name for the syntax you used to describe the formal grammar? I think it is very intuitive, but was just wondering if that is something official or not.
  2. So we have declaration (decl), call (call) and definition (defn) as possible statements. 2.1. I think definition is a new concept, right? We didn't have that in old wasp if I am correct? This is to help with removing duplication? 2.2. declaration makes sense as a concept, it was the dominant concept in old wasp syntax. However, how do we think about call? My first intuition was to call it "unnamed declaration". Situation where we use call is auth {...} and dependencies {...} because those are really "singletons" so it didn't make much sense to name them. They could it in theory be made parts of app MyApp {...} declaration as properties! When you call them call, it makes me think they could be called multiple times, but it doesn't make sense to have auth or dependencies mutliple times in the codebase. Also, call feels like some kind of execution/evaluation is happening. So I am wondering, should we call them "unnamed declarations" instead? Or just "declaration" and the named ones we call "named declaration"? What is your reasoning on this?
  3. Why is <dictval> described as <ident> : <literal> and not <ident>: <expr>? Btw maybe we could name it dictentry instead of dictval since "value" makes me think of the "value" part of the (key, value) pair/entry.
  4. <import> -> this is interesting, this import is actually subset of JS import and it is a point where Wasp interops with Javascript. In the future, idea is that Wasp might support multiple languages, e.g. go, python, and then the imports would be specific to go / python. I am not sure how to best go about it language-wise, but one way I see it going is having specific imports: <jsimport>, <goimport>, <pyimport> and so on. Uniting them all as <import> = <jsimport> | <goimport> | <pyimport> sounds nice, but those are all different types then hm, we have no subtyping or polymorphism (am I talking rubbish here?) so those would complicate things -> we can't expect any import as a type somewhere, we always have to be specific which one. Which is maybe ok, we could change the query type to have couple of optional fields where one is for jsimport, one is for goimport and so on. I got a bit tangled, but it seems to me the best way for now is to do as you did, I would just rename it to <jsimport>, and then we will later figure out how to expand on this -> what do you think?
  5. Why are we going with include as a keyword, and not import, for Wasp includes/imports (<include>)? Both Haskell, which is language which we use, and Javascript, which is language of web dev, use import as a keyword, so wouldn't that be more familiar to both contributors and users of Wasp? include usually indicates copy/pasting external file in the current file, kind of like C does. import usually indicates exposing whole module or parts of that module as an indentifier in the current namespace, so it feels like a more high-level operation. Since Wasp is all about global declarations for now, which get directly "included", it makes sense to maybe go with include. But Haskell also has global declarations (type classes) and it still uses import. So I am rambling now, but I still feel more inclined to go with import since it is more modern and if we start doing something smarter with Wasp modules import will still feel ok while include might start feeling old.
  6. You use <literal> on the right side of the <dictval>, but I don't see <literal> rule anywhere. I assume it should be smth like <literal> ::= <string> | <num>? On the other hand, you do use <string> and <num> directly in the rule for <expr> -> why not use <literal> there to capture those two?
  7. <eol> as delimiter -> do we need it, can't it be just whitespace? I actually think we are not using <eol> as a delimiter right now in Wasp, instead it is just whitespace, but that was because it was all together pretty simple so it could work without it. But I guess introducing <eol> as delimiter from the beginning, even thought we might not really need it, is smart, as we avoid getting into ambiguous situations as syntax gets more complicated?
  8. Just to confirm: this is the first part of the whole system, it could be done with a parser generator like Happy, input would be strings/files and output would be AST. This would encompass lexical and syntax analysis.

2.2 Type System

Right, so this is part where I lack some theoretical knowledge - I will try to ramp it up quickly in the following days so we can have a deeper conversation here. Below are some "wild" questions in the meantime.

  1. I get the basic idea that we have a bunch of rules that allow us to infer types of specific terms based on other terms, so we can detect if expected/inferred type is matching user-specified type (e.g. when user declares smth as a page), or we can detect if there is a situation where inference rules are in conflict (they can't agree on the type) (e.g. if string literal is provided where number is expected). Or maybe those are the same cases?
  2. How is this normally implemented, is there a library/framework for this or is it coded manually? How complex is this?
  3. This allows for custom types which are Enums and Dicts/Data which we can then provide as we see fit, and that is what we do in stdlib? In Wasp users can't (for now) define their own types, but we can. Is this somewhat correct? Also, what is the relationship between Dict and Data -> is that the same thing at the end?
  4. So this second part of the whole system. What is the output of this stage -> type errors + AST enriched with type information? Maybe it is a completely different AST from the one from the previous stage?

2.3 Evaluation

  1. You said evaluation order of declarations and calls is left unspecified. Does it even make sense to say that they are evaluated, ever? I am probably not familiar with the terms, but if I had to describe how I think about it, I would say that expressions are "reduced" and that is it, we are left with declarations, and those declarations are output (used by code generator, which really does "evaluation" in the sense of generating the JS codebase).
  2. What do you mean by "Only the values of declarations and calls are visible to the caller of the wasp interpreter"? That means that output of this stage will be only declarations and calls? I guess I am confused about the terms in "calling the wasp interpreter" -> so who is caller? What is interpreter exatly, that is the part of compiler up to code generator? And code generator is a caller in that instance?
  3. So when we do "include", we over write existing bindings to the same name. What are "bindings" in this context -> declarations, functions and definitions? Since includes are at the start of the file, this means that if two includes have declarations/functions/definitions with same names, the later ones are used and previous ones are disregarded? Isn't this somewhat aggresive -> wouldn't it be better to report an error in such case?
  4. include "other.wasp" as Q binds the variables and declarations from other.wasp to a variable Q in the current file. -> Aha hm, ok that makes sense. What if we don't have as part and we still want to refer to bindings from included file -> no problem because they are in that case added directly to the current namespace, right? Here are some thoughts: what if "include" always has to have "as" part, and declarations and calls are always global in a sense that they will be the accessible for the caller of the interpreter and they have effect in that sense, but they are only referencable via name that was used with "as" to include/import the file? I am suggesting this because I am not a fan of stuff being imported into namespace directly. What are the implications / challenges of different approaches here?
  5. So this is the third part of the whole system. Output of this stage is ... -> Reduced AST? It should really be just a list of declarations and calls, right?

2.4 Standard library

I have never implemented a standard library so I am not completely sure how to think about it, but if I am right, standard library is just a reusable code provided by the runtime itself, right? And as such, it has some more flexibility as to how it is defined, but at the end it has to be following the rules of the language?

In our case, standard library contains enums, data, and functions, which are all really just type definitions, right?

So we have types as a concept in wasp, but developer can't create any on their own because they are not part of the syntax? But, we can inject them directly into the interpreter, and so will be able to do future writers of plugins. Is this correct? I guess I am somewhat confused with this because normally types are something user can create, but here they are not - that is atypical?. But I guess it makes sense.

4.2 Type checking

  1. "omitting language extension and imports" -> I didn't understand the meaning of this?

4.3 Evaluation

  1. I agree that we should not allow circular inclusions.
  2. Mutually recursive declarations / calls -> is that the situation where for example query references specific entities, and if they would also reference that query, that would be a problem? We could maybe just introduce concept of "references" and that is it -> we don't evaluate them in any way, they are symbolic? Anyway this might be a better fit what we need hm, having references of some kind. I am probably confused about this hm. I guess the question is: when in route we have page as one of the properties -> what is that resolved to in the AST? Is it replaced with the value of that page, or is kept as a symbol / reference to that page? Does it matter hm?
  3. I don't know much about Data.Data yet, but I think I get what you are aiming for -> final AST would be composed of the declarations/calls that we registered, right? That makes sense.

General questions

  1. How much does AST change from phase to phase, and do we use one AST in which we cram all the stuff or do we use different AST for each stage?
  2. We are considering the idea of "inline native code" for the future -> this means that you could be able to write JS or Python etc. directly in the Wasp file, instead of referencing external file (for example for queries and actions). This is still pretty unclear as to exactly how it would be done and there are multiple challenges here, but I am wondering, is there any obvious problem that this proposal could impose if we go in such direction, something we should consider right now?
craigmc08 commented 3 years ago

@Martinsos I'll respond to each part in the same format you gave your comments in.

2.1 Syntax

  1. It's Backus-Naur form (BNF), extended with * meaning the last token, <symbol>, or (group) can appear 0 or more times. Additionally, I enclosed all terminals in single quotes.

  2. 2.1. Yes, definition is a new concept. 2.2. I don't like the name call either. It was the closest word I could think of to represent it. I like the declaration and named declaration distinction, however it might be better to remove this type of statement and just have auth and dependencies as a field in app.

  3. I had <literal> and <expr> at some point, but removed <literal> because it wasn't useful. Apparently I forgot to update every symbol. It should say <expr>. dictentry definitely makes more sense than dictval for the name.

  4. I think having a separate import syntax for each language isn't the correct route. Perhaps the wasp import literal would be language agnostic and map to different text in different languages? In js, it's 1-to-1 right now, but in python import { login } from "@ext/LoginPage.js" would map to something like from ext.LoginPage import login.

  5. Changing include to import is fine by me, but we should be careful with documentation on making sure the different between the two uses of the import keyword are clear. I'm still going to call them includes for the rest of this comment.

  6. Same as (3).

  7. Yeah, we don't need <eol>. That's an artifact that was needed in an earlier version of the syntax that I forgot to remove.

  8. Correct, this whole part could be done with a parser generator.

2.2 Type System

re: resources for type inference rules. The way I introduced myself to these types of rules was I think with this article on the Hindley-Milner type system: https://legacy-blog.akgupta.ca/blog/2013/05/14/so-you-still-dont-understand-hindley-milner/. The one non-standard part is how I denoted the Incl, InclQ, Call, and Decl rules. They all use the same syntax that an entire monospaced statement appearing in the conclusion of a rule means that the statement is well-typed.

  1. I think you have the basic idea. The key part is that the rules are applied bottom-up, starting at the leaves of the AST (strings, numbers, etc.) and the types of more complex sub-trees are inferred using the rules. An expression is ill-typed if no rule can be selected whose premise is true.

  2. I've only implemented very simple type systems, but I haven't heard of libraries for type checking, and I think a type checker becomes too specialized for each language they are used for to have a library.

  3. 3.1 Yes, wasp users can't define their own types. They can create dictionaries and lists of any shape though. 3.2 If you mean wasp Dict vs Haskell Data, they aren't quite the same: a wasp Dict is equivalent to a Haskell Data with one constructor using record syntax. Wasp Dicts are a subset of Haskall Datas. Also, Dicts support optional entries (represented in Haskell with a Maybe).

  4. The output is AST + type information. It should be a different AST type since we don't want to be able to pass untyped ASTs to future steps.

2.3 Evaluation

  1. The evaluation is turning the decls and calls into Haskell objects. Where this makes a difference is if a list is outputted, the order is not defined.

  2. The caller of the wasp interpreter is where the current Parser.parseWasp call is in Lib.compile or anywhere else. So when you run the interpreter on a file, returned is the declarations and the calls. You don't get the definitions.

  3. I agree. That should be a type error (if we keep unqualified includes).

  4. Implementing just qualified imports (import "other.wasp" as Q) would be easier than importing both types, and I agree with just having the qualified version.

  5. Output of this stage would be a list of declarations and calls.

2.4 Standard library

Yes, the standard library is just a bunch of type definitions. And the hope is future plugin writers will also be able to add more types.

Not allowing the user to define types in a statically-typed programming language is not something I've ever seen, but I think its reasonable in a configuration language. And with no limit on dictionary and list types the user can create, custom types can be simulated.

4.2 Type checking

  1. I think the page break might have caused confusion, that's talking about the code snippet after that. There are some language extensions and imports that would be needed to make that code compile.

4.3 Evaluation

  1. I think using references would feel a little clumsy when using the output of the interpreter in Haskell. As I mentioned in the proposal, using time travel is the only solution I've thought of to make recursive declarations possible.

  2. That's correct.

General questions - answered

  1. Different AST for each phase would be preferable. If we used the same one, we risk passing data to a phase that it can't use (such as passing an untyped AST to the evaluation stage).

  2. I think the <quoter> literal and type rules makes inlining code from other languages simple and consistent (json and psl are both other languages that are inlined already). But we should keep this in mind when implementing code for the quoter, making sure it can be extended easily in the future with more languages.

Martinsos commented 3 years ago

2.1 Syntax

  1. -> 2.2. Agreed, let's make auth and dependencies fields in app for now. definition will allow for locating them further away from app in the code if that will be interesting. If need be, we can in the future consider introducing unnamed declarations.

  2. Sounds good!

  3. I see what you mean. The thing is, I don't yet know what is the proper presentation of import that could be applied to multiple languages. But since we don't have much info, let's not complicate -> let's do as you said. So while it looks like JS improt, we will actually treat it as a general concept of importing something and for now we need to only map it to JS. Sounds good. We can adapt as needed in the future. Btw I would call it <extimport>, to indicate it is importing "external" code (non-wasp code). Btw if you have better ideas how to name code that is not Wasp code let me know. We were playing with "native" but when you think about it it is not really semanticaly reasonable. Maybe "foreign"?

  4. Ok, let's name it import then, and the other will be called external import.

2.2 Type System

Thanks I will check that resource! Btw do you have any other resources you would recommend regarding type systems, compilers?

  1. -> 3.2. Actually I wasn't talking about the data from Haskell, I meant the data that you mention in 2.4, next to enum and func, and you also mention it in 2.2.1 (Type Rules). I understood that here data, enum and func are concepts of wasp that, while not exposed to the user to define/declare their own, are still part of the language, and we use them in the stdlib. enum and func are what the names say, and I understood that data is a way to declare a composite record-like type that really describes the shape of dictionaries -> so if dict is value, data is a way to describe/declare a shape/type. Finally, as you descried, the data/newtype from Haskell is just a way to describe the stdlib and from those we will create enum, data and func to use in our compiler. So data in wasp just by accident has the same name as data in Haskell. Maybe data is not the best name? Maybe it should be named dict? But that sounds too specific.

2.3 Evaluation

  1. Cool, then let's have only qualified imports! Is it ok to say that while declarations and definitions in them will be bound to Q, therefore making them "localized", any declarations will also be included in the final result of compiler, therefore being "global" in a way? If not, we need a mechanism for "importer" to "execute/call/evaluate" the declarations they want.

4.2 Type checking

  1. Ah of course :D, sorry, for some reason I thought it refers to language extensions and imports in wasp :D. Ok ok.

4.3 Evaluation

  1. Ok, I thought about it more and I agree! One important thing will be to make sure that the name of the declaration is also included into the decaration object that is the result of the interpreter. Recursive declarations -> well they might not be a problem if "consumer" does not care about consuming them in a circular fashion, right? Anyway, we don't even have them for now so it is ok.

General questions

  1. Ok sure, we want to use the type system and make sure that it ensures intermediate formats are as valid as it can.
  2. Certainly! I think there will be more developments there, e.g. we might at some point look into allowing writing external code at top level, next to declarations, but ok, I think there won't be big problems expanding in that direction. Great!

Going Forward (THIS IS NEW)

It might also be smart to define a bare-bones version of the interpreter that we will get working first, and then we add the rest of the features. I will give it a quick go right now: for the very first version we could kick out wasp imports and definitions. Well, that brings us to the version of Wasp we have right now. If we kick out more it doesn't sound like there will be much left heh :D.

craigmc08 commented 3 years ago

2.1 Syntax

  1. I think foreign code makes more sense than native code.

2.2 Type System

  1. Ah, wasp data, not Haskell data. I had not considered the difference between data and dict that much, but in the current form, the differences I can think of are:

    • You can't access properties of a data, they are opaque and their values are inaccessible.
    • The type system doesn't require a data to have a dictionary as its argument, i.e. contentTypes jsonOnly ['application/json'] is valid. You could even have a list of dictionaries as the argument.

    These differences can be changed or removed if they are not desired. If neither are desired, the data vs dict distinction could be removed. But this would allow code like this:

page Main {
  component: import Main from "@ext/MainPage.js",
  authRequired: true
}

page AlsoMain Main

To me, it looks a little strange, even if it would be syntactically valid and type check.

2.3 Evaluation

  1. Good point about "re-exporting" imported declarations. I think having implicit re-exporting would feel weird with a qualified import. Perhaps a new export statement? I can come up with 2 ways this could look:
-- (1)
import "other.wasp" as Q

export Q.something

-- (2)
export something, somethingElse from "other.wasp"

I think (1) isn't as great, because it makes it look like export 7 could be valid, but it isn't. And if you want to succinctly re-export everything from "other.wasp", (2) offers export "other.wasp", but I can't come up with a way for (1) that makes sense.

4.3

  1. Yes, name definitely needs to be in the output of the interpreter.

Going Forward

I agree, for the first pass at this we can skip wasp imports and definitions, and there isn't much else to remove. Adding those in later should require any major reworks of this system: a map from names -> values will already exist for declarations referring to each other. We can also skip the dot accessor (dict.key) and add it along with wasp imports or definitions, since it won't be useable until one of those is implemented.

Martinsos commented 3 years ago

2.1 Syntax

  1. And "external" vs "foreign"? "external" is the term we are using right now, but I am not 100% in love with it. Although "foreign" also does not feel like 100% match hm, but it feels semantically more correct.

2.2 Type System

  1. Thanks for explanation -> I will need to dive deeper into the type system and inference rules to be able to properly respond to this - I will get back to you soon regarding this.

2.3 Evaluation

  1. I agree, I also like the (2) better! I agree that we should have this, although I wasn't talking about this actually, what I meant is following: Imagine you have two files, main.wasp and utils.wasp. main.wasp declares page Main, and utils.wasp declares page Utils. If main.wasp imports utils.wasp and does nothing else, doesn't refer in any way to page Utils from utils.wasp (or re-exports it), what is the result of interpreter -> is it just Main page declaration, or is both Main and Utils pages declarations? One way is to say that all declarations are global, therefore as soon as file is imported they become part of the importer. And another way is to go with explicit exports, as you described. Now that I am talking about it, I realize this is exactly what you are talking about and you suggested approach with explicit re-exports. I like the simplicity of approach implicit reexports, but it makes it hard (impossible?) to choose which declarations to use and which not in case when there are multiple choices in one module/file. Is this correct? If so, ok, probably better to go with explicit re-exports.

Going Forward

  1. I would rather we do it all directly in this repo. We can start implementing new interpreter directly next to the existing one and use feature flags and CLI flags in case we need to hide something from the released code. This way we will not have to bother later with merging proposal with the main project. So https://trunkbaseddevelopment.com/ . What do you think about that, any cons? Docs -> sure, let's keep the parser code as documentation of grammar, and we can have LaTeX file in the repo documenting the type system.
  2. I managed to do some reading of the weekend -> couple more days and I think I will be good to go.
  3. Sounds good! Oh one question -> you are calling it "interpreter" -> I don't have anything against that, but is that the best name? How do you feel about it and are there any other alternatives? When I hear "intepreter" I imagine that code is executed as it is read line by line. Would "compiler" be a better term? Or "analyzer"? In our current codebase we have Parser + Generator = Compiler. We are still going to have Generator, but other parts we could call differently if we have better names. Should it be Interpreter + Generator = Compiler? Or Compiler + Generator = Something? Or Analyzer + Generator = Compiler? or we just have Compiler/Interpreter and Generator and no name for both of them together?

Removing dot accessor at start -> sounds good.

Martinsos commented 3 years ago

I finalized my learning/reading about type inference and type rules, here are my thoughts on the Type Rules part:

Decl: In the conclusion, should it be E = E, x : t instead of E = E, x : typeName?

DictNone: I get the idea, but I am not sure about how to interpret j : sigma?, I have never seen that notation before - how would you read that? Just to clarify, so the idea is that if e is of type dict1, it is also of type dict2 as long as dict1 contains all the same fields as dict2 except for one optional field, which can then be recursively applied to allow for any number of optional fields? That means it has multiple types hm, that is interesting - we have some kind of subtyping? If we have two types, where T1 is {a: Int, b: Maybe Int} and T2 is {a: Int}, and we have value {a: 3}, which type does it have and can it be used at the same time for one T1 declaration and one T2 declaration? Actually, we have some kind of polymorphism, and not sub-typing, is that correct? Because you say, with DictNone and DictSome, that one expression can have multiple types -> if it has type t1 and it is in following relationship with type t2, it also has type t2. Although subtyping sounds more practical in the implementation.

DictInst: I don't like the idea of DictInst because it will not report as type error situation where somebody misspells the key in dict. Can we describe Dot in some other way? Maybe E |- e : {k : t, ...} ==> E |- e.k : t where ==> is horizontal line (separation between premises and conclusion)? Looks a bit like cheating, but should be ok implementation wise? Or not, because we can't infer the type of e hm. Well, this is tricky. Ah I see it now. Ay yay yay.

x = [{a: 1}, {}] -> interesting! For the practical reasons, we probably want to go with [{a: number?}]? What do we even consider to be the type of x = {a: 1} -> is it {a: Int} or {a: Int?}? Maybe the second one since it is more general? I wonder what happens in our type checker if expected type is {a: number} and actual type is {a: number?} -> that should check out? But which mechanism is that?

I think the tricky part is dictionaries and their ability to have optional fields -> we should figure out the details of that.

Martinsos commented 3 years ago

@craigmc08 here is a provocative thought: do we even need static typing in Wasp? When I think about it, in Wasp compile time == runtime -> all the code is executed during the analysis time (since there is no input and there is no branching)! If we added some constructs like conditionals and similar that will stop being true, but I don't know if we will ever add them + even if we do, logic will probably not be very complex + if needed we can maybe add static typing later? I was motivated to start thinking about this because of the complexity with the dictionary and optional fields that I talked above. So, to conclude: does it even make any sense to have static typing in Wasp, maybe instead we should do all the checks in runtime (while evaluating and generating final AST) and that is it? The more I think about it, the more it sounds like the correct approach. Let me know what you think!

craigmc08 commented 3 years ago

2.3

  1. With implicit re-exports, you are correct that you wouldn't be able to limit what was export. And I think it feels more consistent with requiring qualified imports to have explicit exports (with respect to not polluting namespaces automatically).

Going Forward

  1. I'm not sure what I think about using trunk based development. I've never used it, and to me it sounds odd to bring frequent changes directly to master. I also don't think having a long-lived feature branch for the new interpreter will cause many (or any) merge conflicts in the end. In my mind, the first pass version of the new interpreter would have an identical API to the current one, and not require any changes anywhere else. But I'm open to trying trunk based development for this, I'll leave it up to you.

  2. Interpreter might not be the best name. As far as I know, the line between interpreter and compiler is pretty fuzzy. The arbitrary place I draw the line is that compilers produce some artifact to be run by an interpreter (counting the hardware as an interpreter) or maybe compiled by another compiler (ex. gcc produces an executable, java -> .class run by JVM). So interpreter + Generator = compiler. But since the "interpreter" isn't actually running anything, it's not a good name. Analyzer sounds good to me.

Type System Questions

Decl

E = E, x : typeName is not a mistake. For example, in

page Home {
  component: import { Home } from '@ext/HomePage.js',
}

Home is given the type page. In the type rule, t is the type of the argument: { component: import, authRequired: bool?, params: [...]? }

Pros:

Cons:

To me, the pro is much more important than the cons since without it, the generator could produce incorrect code.

DictNone

j: sigma? means that the key j is optional, but contains type sigma if it exists. sigma? would map Maybe<sigma> in Haskell. And yes, your understanding of the rule is correct.

I think the term for the polymorphism in this design is subtype polymorphism. It won't be necessary to assign multiple types to one expression. When defining a dictionary, the "smallest" type is always given to it, as in it has no optional properties.

DictInst

I see what you mean: a situation such as

page Home {
  component: ...,
  authReqiured: true -- Note typo
}

should report an error. You're rule looks good to me. Inferring the type of e shouldn't be a problem. It doesn't look like cheating, and should definitely be fine to implement.

x = {[a: 1}, {}]

I agree with [{a: number?}]. For {a: 1}, choosing {a: number?} would discard the information that a exists, which could not be recovered later. I say it should start as {a: number} (see type rule Dict).

Do we need static typing?

Perhaps static typing isn't the best term. There is no evaluation time in wasp, so what would the distinction between static and dynamic be? Type analysis may be a better term. Anyways, the runtime checks would end up being equivalent to a type system. I think the question boils down to when should type analysis happen? These are the possible answers I came up with:

  1. During parsing (like wasp does currently)
  2. In its own pass, between parsing and translation to the final Haskell types (current proposal)
  3. During translation to the final Haskell types (your suggestion)
  4. In the Generator code

I rule out (4) completely, for hopefully obvious reasons. (1) hides the things in common between types and allows syntax inconsistencies to creep in. Between (2) and (3), I'd choose (2) since maintaining more small passes with a single task sounds easier to me.

In a sentence: I think type checking should happen in its own pass per the proposal, separate from generating the Haskell types.

Martinsos commented 3 years ago

Trunk based development - couple of years ago I was a big proponent of longer-living feature branches, however I ended up working in a team where lead dev suggested we try trunk based development, and I have to say I found it to work better. It avoids big drifts which result in complicated merge or many smaller merges, it forces iterative design with small iterations. It might not be that important due to the clear APIs, as you said, but still, imagine if you are developing this new interpreter and I am adding new feature to the language -> access control on backend for example. Instead of me doing that addition in master and then you rebasing/merging later and adding the same support in the new interpreter (and what if you miss some parts of it?) I can do it immediately for both old and new parser while I am at it and it is still fresh for me. I would give it a try and let's see how it goes, you will try something new so that is a plus :D, and if it proves to be tedious we can switch to long-lived feature branch.

I like the Analyzer + Generator = Compiler terminology.

Type System

Decl

Got it, makes sense!

DictNone

Ok, what you say sounds good. I am still not sure though on how exactly are these types handled. So, let's say I have x = { name = "Foo", auth = Google }. I also have app MyApp x where app is of type { name: String, auth: AuthProvider? }. So inferred type of x is { name: String, auth: AuthProvider }. Does app MyApp x typecheck? Or what if I have [{ name: "Test" }, {}] -> if first member of this array is accessed (although we don't have accessing array elements yet but let's say we have) and provided as body for app declaration, would that type check, taking into account that this member is inferred as { name: String? }? Aha hm hm hm, ok I seeee :D, so in this case it wouldn't type check and that is fine because the element of the array are not guaranteed to have that field. Makes sense.

DictInst

Sounds good.

Static Typing

I like how you divided this into 4 steps, that is a great way to look at it!

I agree that steps (1) and (4) are not good places to do the type analysis -> (1) is syntax analysis and should do just that to keep things simple, while (4) is well, not even part of the analyzer, so it would make no sense to try and do it there.

It comes down to (2) and (3) as you said, and when I was talking about static typing I meant (2) and when I was saying dynamic typing I meant (3). But there is a difference between the (2) and (3). (2) is going to be more formal (type inference of a kind) and will follow a set of rules that we have to define and make sure they are correct / sound. On the other hand, (3) should be simpler to both design and implement -> we just embed the "common sense" checks while manipulating the data (e.g. while constructing the final Haskell types). No need for type inference and type rules. But then it might be hard to recognize some situations, e.g. entities: [{ ...}] might not be detected as an issue if dict passes the duck-typing (well depends on the implementation)?

The main reason why I started thinking about this was because I thought it will be hard to define simple rules for dict that will play well with optional fields, but it seems to me now that will not be a problem, I got confused earlier with this. What I was also afraid was that if we come to the point where we have more serious polymorhism it might result in pretty complex rules compared to easy checks in (3) that we could do instead.

What is your take on this, pros and cons? I understand the pro of separate small tasks, but how more complex are those separate small tasks (2) compared to (3), especially if language evolves, vs the value they bring? Btw right now I feel I am in favor of (2), but I would still like to think this through once more before we make a decision.

craigmc08 commented 3 years ago

Trunk based development

Ok, trunk based development it is then!

Type System

DictNone

In the first example, app MyApp x does type check. The definition of x constructs a type environment E = {x : { name: String, auth: AuthProvider }}. A proof of app MyApp x could be:

1. Let t = { name: String, auth: AuthProvider }, s = { name: String, auth: AuthProvider? }
2. x : t is in E, so by [Var], E |- x : t
3. Then by [DictSome], E |- x : s
4. And since `data app = s`, [Decl] shows that `app MyApp x` is valid updates E = E, MyApp : app

Note that this proof does not update the type of x in E to be s, it remains t, so x.auth would still type check in some other place.

You're correct about the second example. You can not access the value of an optional property.

Static Typing

If we go ahead with the implementation I suggested in sections 4.2 and 4.3 using Haskell generics, then in whichever case we implement, the type system/checks must be sound, or we could end up with runtime errors from trying to construct values. Because of this, the amount of checks in (2) and (3) would likely be similar. If we do (2), it's likely there will be more lines of code total.

In (3), we will have type checking and manipulation of Data.Data types (which looks like it might be tricky and dense code) mixed together. Since both of those need to be carefully written in order to avoid runtime errors, I see it as a con to mix them and have to reason about both of them simultaneously.

Another con for (2) is that its performance may be slightly worse, since it adds another walk of the AST and builds a new data structure. However, I think this is not something to worry about since IO in the Generator probably takes orders of magnitudes more time than the Analyzer takes.

To summarize:

There are certainly other pros/cons I'm missing, and I am definitely biased towards (2).

Martinsos commented 3 years ago

DictNone -> ok, all clear. Sounds like a good way to go about it actually, I like it now that I understand it :D. How do you feel about it, anything that you feel is not great?

Static Typing -> I think slightly lower performance is not an issue, and more code is better than more complexity. So, taking that into account, and taking into account that you know this part better than me, I agree on going with your bias/intuition/arguments, (2). If need really be, we can always change the course later.

Well, I think we covered most of the things!

What is the next step? New proposal, possibly in repo as LaTeX so we can edit it, and we can start with some initial, dummy code?

Martinsos commented 2 years ago

Done!