Aliases and opaques should not escape their declaration scopes

ayazhafiz commented 2 years ago

At the time of writing (ea4e2a706d9330404926562d4ae32471cb79af34), the following reproduces (via repl):

» f =
…     Age : U8
…
…     dev : Age
…     dev = 21
…     dev
…
… f

21 : Age

In this example, Age has escaped the scope it was declared in! We should not allow this to happen, because a user of f is never able to know what Age is without looking inside f itself - worse yet, a user of f cannot use the Age alias, though they can use f.

The same goes for opaque types - once they land (#2214), the following should also not be admitted:

» f =
…     Age := U8
…
…     dev : Age
…     dev = @Age 21
…     dev
…
… f

21 : Age

Some implementation notes:

This will require a pass after type inference, similar to how exhaustiveness checking is done. This is because we only the resolved types of expressions after this point. However, we may be able to do it inline with exhaustiveness checking - if not, it should still be a pretty fast pass!
At least in the current state of the language, only two things must be checked:
1. The type of a value references only aliases/opaques in the same or higher scope as the value itself
2. Exported values reference only aliases/opaques also exported
  - This is a special case of the first check
  - Here is one (efficient) algorithm for this check: do DFS of each scope, starting from the top-level. At each scope collect the names of the declared aliases/opaques (you only need the names, not the types, since that's already checked to be correct!). Check that the type of each value in the current scope only references the collected names (types will need to be monomorphized here, hence the pass during monomorphization). Then descend into each sub-scope. Keep the names defined in the outer scope as a previous link in a linked list; since most modules shouldn't have very deep scopes, walking the linked list will probably be less expensive than copying the previously-found names each time we enter a scope! This is O(n * max(s, d)) where n is the number of values in a module, s is the max depth in a module, and d is the max depth of a type - in general d will be larger than s, but also s and d will usually both be pretty small!

rtfeldman commented 2 years ago

Interesting! I'm actually not sure if we should change the current behavior. 🤔

Consider each of these cases in the context of a module export.

Should I be able to export f : Age outside the module where an Age opaque type is defined? Of course - that's the whole point of opaque types! And what type should we display when it gets imported in other modules without Age having been imported? I think OtherModule.Age is probably still the most helpful, even if it's not in scope.

Similarly, should I be able to export f : Age outside the module where an Age type alias is defined? Of course! And what type should we display when it gets imported in other modules without Age having been imported? Again I think OtherModule.Age is probably still the most helpful, even if it's not in scope. We could instead deconstruct the alias and display that, but that could lead to some enormous and confusing types being printed when aliasing a huge record or something. (In fact, there are plenty of times when the whole purpose of an alias is to make a giant type easier to read!)

So if that's how they should both work in the context of modules, should they work any differently in the context of expressions?

The only argument I can see for having them work differently for expressions is: what if I define an Age alias in one expression, and then a different Age alias in a different expression within the same module, and then both aliases "escape" their scopes like this, and then it somehow becomes unclear which is being referred to?

I think it's difficult to imagine that scenario would actually come up often enough (ever?) in practice that it's worth considering a significant drawback. 😄

ayazhafiz commented 2 years ago

The use case I would be most worried about is someone wanting to use the alias in a type annotation, either for readability or another use case. For example, in another module Staff, suppose I have the following code:

interface Staff exports [ allStaff ] imports [ Age.{ ageFromNumber } ]

eliza = { age : ageFromNumber 31, email: ..., tenure: ... }
fernando = { age : ageFromNumber ..., email: ..., tenure: ... }
grace = { age : ageFromNumber ..., email: ..., tenure: ... }

allStaff = [ eliza, fernando, grace ]

Now let's say I want to define a Person alias in this module:

Person : { age : ???, email: Str, tenure: U32 }

...

allStaff : List Person
allStaff = ...

I run into some trouble - what do I put for the type of age? When Age is an alias I can use the real structural type (to an extent, up to the limitations/downsides you've described), but if Age is an opaque type, I can't find a type to give age, if it's not exported.

What do you think?

rtfeldman commented 2 years ago

Ahh interesting! So basically a warning instead of an error, because doing this works but can be inconvenient for others later.

With that in mind, I think there are two cases:

I've exposed a value from this module. That value's type includes an opaque type or type alias I've defined in this module, but I haven't exposed the type in question, meaning nobody outside this module can annotate the value I've exposed. This should be a warning, and the suggestion should be exposing the alias (or opaque type).
An expression evaluates to a value whose type includes a type alias or opaque type that was defined in this expression's scope, meaning other expressions cannot add a type annotation to the value using that alias or opaque type because it's out of scope. This should also be a warning, but this time the suggestion should be moving the alias or opaque type declaration to a higher scope (still within the same module).

Assuming that sounds like the way to go, one thought on how to avoid traversals would be to write down the definition sites for later checking.

For example, if I'm defining a type alias or opaque type in an Expr::Def, then if that Def returns a value which uses one of those type aliases or opaque types, then we want to give a warning. So what if we write down somewhere "check this Def return later for the presence of these type aliases and/or opaque types" for later? Similarly, whenever we define an alias or opaque type in a top-level module declaration, we can check to see if it's exposed; if it's not, we can add it to a list of "check all the exports for the presence of this type" list for later.

Then after type-checking is complete, we can go back and run through the list of checks to see if there were any problems.

Unless I'm missing something, that would save us from having to do another traversal of the AST!

ayazhafiz commented 2 years ago

Yep! Exactly. this would never have to be a hard error, because of course we can continue to type check etc. correctly, since at the end of the day all the compiler cares about is the underlying structural type.

With regards to the implementation - to avoid an extra pass, I think we can do this entirely inline during monomorphization. During canonicalization we would have already collected all the type aliases/opaque names in a scope, so we just need to pass that down to the mono IR pass. Then when we convert Expr::Defs (or their equivalent in the can AST, Expr::Let(Non)Rec) to the IR we can do the "exposed" check you mentioned. This way we don't even have to walk the list of Defs twice, since we already plan to walk them during mono.

rtfeldman commented 2 years ago

to avoid an extra pass, I think we can do this entirely inline during monomorphization

This is interesting - currently we actually do exhaustiveness checking during monormorphization too.

One downside of this is that it means in order to show these errors in the editor, or via roc check, we have to do all the work of monomorphization even though we're not going to generate any code. 😅

However, I didn't consider that there's a performance benefit to doing it there when we are generating code!

🤔 I wonder if someday we should actually implement it in two ways: once in monomorphization, for the fastest possible "build + run" cycle, and then have a separate implementation that's decoupled from monomorphization, for roc check and especially the editor (where we'd want these errors to appear in realtime!) that could run much faster if we aren't needing to generate code. Maybe the two implementations could even share code somehow!

ayazhafiz commented 2 years ago

That's a great point, we totally could!

Thinking about it, we could even do it during type inference/checking - since during constraint generation we'll have known what defs there are, and once we solve a Let, we can check that its type only includes stuff in the current scope!

roc-lang / roc

Aliases and opaques should not escape their declaration scopes #2555