roc-lang / roc

A fast, friendly, functional language.
https://roc-lang.org
Universal Permissive License v1.0
4.4k stars 309 forks source link

Aliases and opaques should not escape their declaration scopes #2555

Open ayazhafiz opened 2 years ago

ayazhafiz commented 2 years ago

At the time of writing (ea4e2a706d9330404926562d4ae32471cb79af34), the following reproduces (via repl):

» f =
…     Age : U8
…
…     dev : Age
…     dev = 21
…     dev
…
… f

21 : Age

In this example, Age has escaped the scope it was declared in! We should not allow this to happen, because a user of f is never able to know what Age is without looking inside f itself - worse yet, a user of f cannot use the Age alias, though they can use f.

The same goes for opaque types - once they land (#2214), the following should also not be admitted:

» f =
…     Age := U8
…
…     dev : Age
…     dev = @Age 21
…     dev
…
… f

21 : Age

Some implementation notes:

rtfeldman commented 2 years ago

Interesting! I'm actually not sure if we should change the current behavior. 🤔

Consider each of these cases in the context of a module export.

Should I be able to export f : Age outside the module where an Age opaque type is defined? Of course - that's the whole point of opaque types! And what type should we display when it gets imported in other modules without Age having been imported? I think OtherModule.Age is probably still the most helpful, even if it's not in scope.

Similarly, should I be able to export f : Age outside the module where an Age type alias is defined? Of course! And what type should we display when it gets imported in other modules without Age having been imported? Again I think OtherModule.Age is probably still the most helpful, even if it's not in scope. We could instead deconstruct the alias and display that, but that could lead to some enormous and confusing types being printed when aliasing a huge record or something. (In fact, there are plenty of times when the whole purpose of an alias is to make a giant type easier to read!)

So if that's how they should both work in the context of modules, should they work any differently in the context of expressions?

The only argument I can see for having them work differently for expressions is: what if I define an Age alias in one expression, and then a different Age alias in a different expression within the same module, and then both aliases "escape" their scopes like this, and then it somehow becomes unclear which is being referred to?

I think it's difficult to imagine that scenario would actually come up often enough (ever?) in practice that it's worth considering a significant drawback. 😄

ayazhafiz commented 2 years ago

The use case I would be most worried about is someone wanting to use the alias in a type annotation, either for readability or another use case. For example, in another module Staff, suppose I have the following code:

interface Staff exports [ allStaff ] imports [ Age.{ ageFromNumber } ]

eliza = { age : ageFromNumber 31, email: ..., tenure: ... }
fernando = { age : ageFromNumber ..., email: ..., tenure: ... }
grace = { age : ageFromNumber ..., email: ..., tenure: ... }

allStaff = [ eliza, fernando, grace ]

Now let's say I want to define a Person alias in this module:

Person : { age : ???, email: Str, tenure: U32 }

...

allStaff : List Person
allStaff = ...

I run into some trouble - what do I put for the type of age? When Age is an alias I can use the real structural type (to an extent, up to the limitations/downsides you've described), but if Age is an opaque type, I can't find a type to give age, if it's not exported.

What do you think?

rtfeldman commented 2 years ago

Ahh interesting! So basically a warning instead of an error, because doing this works but can be inconvenient for others later.

With that in mind, I think there are two cases:

Assuming that sounds like the way to go, one thought on how to avoid traversals would be to write down the definition sites for later checking.

For example, if I'm defining a type alias or opaque type in an Expr::Def, then if that Def returns a value which uses one of those type aliases or opaque types, then we want to give a warning. So what if we write down somewhere "check this Def return later for the presence of these type aliases and/or opaque types" for later? Similarly, whenever we define an alias or opaque type in a top-level module declaration, we can check to see if it's exposed; if it's not, we can add it to a list of "check all the exports for the presence of this type" list for later.

Then after type-checking is complete, we can go back and run through the list of checks to see if there were any problems.

Unless I'm missing something, that would save us from having to do another traversal of the AST!

ayazhafiz commented 2 years ago

Yep! Exactly. this would never have to be a hard error, because of course we can continue to type check etc. correctly, since at the end of the day all the compiler cares about is the underlying structural type.

With regards to the implementation - to avoid an extra pass, I think we can do this entirely inline during monomorphization. During canonicalization we would have already collected all the type aliases/opaque names in a scope, so we just need to pass that down to the mono IR pass. Then when we convert Expr::Defs (or their equivalent in the can AST, Expr::Let(Non)Rec) to the IR we can do the "exposed" check you mentioned. This way we don't even have to walk the list of Defs twice, since we already plan to walk them during mono.

rtfeldman commented 2 years ago

to avoid an extra pass, I think we can do this entirely inline during monomorphization

This is interesting - currently we actually do exhaustiveness checking during monormorphization too.

One downside of this is that it means in order to show these errors in the editor, or via roc check, we have to do all the work of monomorphization even though we're not going to generate any code. 😅

However, I didn't consider that there's a performance benefit to doing it there when we are generating code!

🤔 I wonder if someday we should actually implement it in two ways: once in monomorphization, for the fastest possible "build + run" cycle, and then have a separate implementation that's decoupled from monomorphization, for roc check and especially the editor (where we'd want these errors to appear in realtime!) that could run much faster if we aren't needing to generate code. Maybe the two implementations could even share code somehow!

ayazhafiz commented 2 years ago

That's a great point, we totally could!

Thinking about it, we could even do it during type inference/checking - since during constraint generation we'll have known what defs there are, and once we solve a Let, we can check that its type only includes stuff in the current scope!