metaborg / nabl

Spoofax' Name Binding Language
Apache License 2.0
7 stars 12 forks source link

Require that a Statix variable is bound #89

Closed MeAmAnUsername closed 2 years ago

MeAmAnUsername commented 2 years ago

Short description A way to require that a variable is bound.

Problem description. I sometimes run into problems where an unbound variable is put in the scope graph. This then takes me a long time to figure out, because the error only shows up at a totally unrelated place where the value is looked up and used. It would help if I could require that the value is bound before it is saved to the scope graph. Same happens with variables for return values: in my code, these should always be bound at the end of a function.

Describe the solution you'd like At the very least, a general way to require that a variable is bound, e.g. a constraint requireBound(theVar) that fails if the variable is not bound. Then it will at least show an error at the location where I expect it to be bound, instead of only at the use sites where it turns out that invariant was broken.

Describe alternatives you've considered Solution I'd like: a workaround is to define a function which uses the variable as input and has two rules with different input shapes. That function will fail with a delayed on foo error if foo is not bound yet. Unfortunately, such a function cannot be defined generically even if/when generics in Statix exist, because it requires two specific constructors for the sort. requireBound could just be syntactic sugar to generate this workaround function automatically.

Even better would be a good root cause analysis that tells me "this is the position where the variable should have been bound, but it wasn't", but that is hard if you assume that constraints are solved in a random order. In my specific case return values of functions should always be bound "after" the function runs and arguments should be bound "before" the function runs, so given

  foo(some, arguments) == res,
  bar(res, more, arguments) == other_res

if res is unbound, then that is because foo failed to bind it, not because bar failed to bind it. Using this in combination with cascading error suppression means that any subsequent errors in bar can be suppressed. That may make it really easy to find the root cause, as the error will always be reported at the exact location where it first failed to be bound. I don't know if this can be generalized to an analysis that is useful for others as well, because it depends on the assumption that it is always the return value that is the root cause of the error.

AZWN commented 2 years ago

It is possible to define requireBoundS : S as follows:

signature
  sorts S constructors
    C1 : S
    C2 : S

rules
  requireBoundS: S

  requireBoundS(C1()).
  requireBoundS(C2()).
  requireBoundS(_).

  usage: /* ... */ -> S

  usage(...) = S :-
    // ...
    requireBoundS(S).

In this snippet, a rule for requireBoundS can only be selected when it is instantiated. If not, it will generate an error for being delayed indefinitely. I think this will also have the desired effect w.r.t. cascading error suppression.

Previously, it was possible to generate requireBoundS using the menu option Format signature-rule AST. For some reason, it broke at a given moment, but it might be possible to revive without too much effort.

Does that provide you with a direction to solve your problem?

MeAmAnUsername commented 2 years ago

I've already mentioned this workaround in the alternative solutions. It does work (with some special cases for indestructible types, e.g. strings and scopes), but is quite a lot of boilerplate and needs to be repeated for every sort, so I don't consider that worth it at the moment.

Ideally I'd like a solution where I don't need to add these checks explicitly. It adds multiple lines of boilerplate to every function, I can forget to add them, and it adds debug code to every function. If I only add them when debugging, I still need to remember to do this when debugging, in which case I might as well just remember to check the scope graph to see if there are unbound variables.

AZWN commented 2 years ago

Yes, but you didn't mention it ought to be possible to generate these rules.

My concern is that there is a lot of personal preference and ad-hocness in this proposal. Although it is a good convention to have return values/scope graph declaration be ground, I can think of many good examples where that rule should not be followed. The whole point of using type variables is abstracting over computation order and input/output thinking.

A more clean solution would be to add in/out modes to arguments, but that has far-reaching consequences for the evaluation semantics as well, so I don't regard that as a viable option now.

What I perhaps can do is making the rules that the aforementioned menu option generates implicitly available.

MeAmAnUsername commented 2 years ago

My concern is that there is a lot of personal preference and ad-hocness in this proposal.

Agreed

The whole point of using type variables is abstracting over computation order and input/output thinking.

I don't know how well that works, given that variables can be delayed and edges can be critical. I may not have to think about actual computation order, but in order to write and debug Statix specifications I still have to consider the dependencies between various types of information, which is not too different from computation order.

A more clean solution would be to add in/out modes to arguments, but that has far-reaching consequences for the evaluation semantics as well, so I don't regard that as a viable option now.

Sounds interesting and like it would solve this issue, but I can see how that is too much work.

What I perhaps can do is making the rules that the aforementioned menu option generates implicitly available.

Maybe, but that might be a bit confusing too. And it comes with a performance hit even if you don't use it (at the very least, build times will increase). Perhaps just fixing the menu option would be better.

Either way, I resolved the bug that prompted this issue (turns out that in func test[F](fruit: F) -> F = test[F](fruit), the type argument F really does not have a type besides "the generic parameter"), so this issue is no longer immediately relevant for me, and I'm hoping that I finish my thesis before I run into a similar issue.