purpleidea / mgmt

Next generation distributed, event-driven, parallel config management!
https://purpleidea.com/tags/mgmtconfig/
GNU General Public License v3.0
3.47k stars 308 forks source link

lang: ast: ExprBind is now monomorphic #723

Closed gelisam closed 7 months ago

gelisam commented 7 months ago

An attempt at fixing #722

gelisam commented 7 months ago

@purpleidea all right, let's see how many tests this break!

gelisam commented 7 months ago

Test TestAstFunc2 0 fails with could not unify types: call doesn't contain an expr pointer yet. I think that means I'm not filling in ExprCall.expr because I'm not traversing the original definitions, only the copies. But now there aren't any copies anymore, so I should traverse and typecheck the original definitions.

gelisam commented 7 months ago

Fixed, only 3 failures left! Including, hilariously, the clear-env-on-var.txtar test I just added in the previous MR. That one makes sense: the $wat = $x line is now scope-checked before $f's body, so the test fails with a different error message.

The other two tests, map-iterator1.txtar and polymorphic-lambda.txtar, are specifically making sure that $add = func($x) {$x + $x} can be instantiated at two different types, so hopefully you agree that with the new assumption that StmtBinds are no longer polymorphic, it is now the tests which are wrong.

purpleidea commented 7 months ago

Fixed, only 3 failures left!

Amazing!

Including, hilariously, the clear-env-on-var.txtar test I just added in the previous MR.

haha! The code $DEITY's love us!

I'll look over those tests now, and I'll also check we have one for StmtFunc polymorphism =D

Are there any negative side-effects or important features we're leaving on the floor with this?

purpleidea commented 7 months ago

I added all of this ontop and it's in bug/722 branch testing! (I fixed those tests, let's see if it works!)

purpleidea commented 7 months ago

Here: https://github.com/purpleidea/mgmt/actions/runs/7096929277

purpleidea commented 7 months ago

The AST tests pass, but unfortunately, we still have a double. My new test fails:

interpret_test.go:715: test #142: funcs: Erroring func `call`: panic in Stream of func `call`: should not get called twice

Basically we still have a copy.

gelisam commented 7 months ago

Hmm, this is completely wrong.

StmtBind.SetScope(), StmtBind.Unify(), and StmtBind.Graph() were all no-ops, because the work was done in the copies. Now that I have removed the ExprPoly wrapper, there aren't any copies anymore, so the work isn't done anywhere! I have changed StmtBind.SetScope() to no longer be a no-op, but I forgot to change StmtBind.Unify(), and StmtBind.Graph() as well. It is thus quite surprising that so few tests are failing!

~In fact, when I change those two, a lot more tests fail!~ (we're down to two failures again, not sure what changed)

purpleidea commented 7 months ago

It is thus quite surprising that so few tests are failing!

If any of these situations makes it obvious what new tests to add, of course, please go crazy! I must admit that the test suite was not written methodically, so there is likely some permutation of features which is not tested.

gelisam commented 7 months ago

Ah, of course! StmtBind.Unify() might not call Unify() on the expression it binds, but ExprVar.Unify() does. So the test could only potentially fail if it contains $x = ... but no uses of $x. And that case doesn't fail either because the $x = ... doesn't contribute any constraints, so it can't cause type checking to fail.

Similarly, StmtBind.Graph() doesn't call Graph() on the expression it binds, but ExprVar.Graph() does. And since each call to Graph() generates a new Func, this explains why there are two Funcs and thus why test 142 fails.

gelisam commented 7 months ago

All right, here's the plan. We want to scope-check, type-check, and graph each the expression bound by StmtBind exactly once, regardless of how many use sites the variable has. So StmtBind.SetScope(), StmtBind.Unify(), and StmtBind.Graph() must all make a recursive call. I've done that already. What is missing is that ExprVar.SetScope() etc. should only make a recursive call if their target was a PolyExpr.

Currently there is no way to tell, because if ExprVar.SetScope() points to a PolyExpr, it replaces its target with a copy, and then ExprVar.Unify() and ExprVar.Graph() always make a recursive call. So we need ExprVar to track whether it has made a copy, perhaps by making it point to a copy of the PolyExpr instead of a copy of the Expr wrapped by the PolyExpr.

Or maybe... maybe ExprVar is never making a copy now? Hmm... what about classes? We said that we do want classes to be polymorphic, but that could mean two different things. Do we want to support this:

class c($id) {
  $int = $id(42)
  $str = $id("foo")
}

or that:

class c($x) {
  test printf("%v", $x + $x)
}
include c(42)
include c("foo")

or both? or neither?

gelisam commented 7 months ago

Another question: in the following code,

$y = $x + $x
$x = 42

are the StmtBind.Unify() and StmtBind.Graph() calls done in the order $y, then $x, or in the order $x, then $y?

purpleidea commented 7 months ago

This is a good question:

class c($id) {
  $int = $id(42)
  $str = $id("foo")
}

Since it violates our "lambdas in variables can't be polymorphic" rule from before, my answer would be: "it doesn't bother me if it's not allowed, and if it's easily possible, great, but if we need for it to be a compile error, that's okay.

(Side note, of course the class in this situation doesn't do anything since $int and $str aren't used, so this alone might not be a good test case without a res in there somewhere, as the compiler might nuke them since they're not used.)

This second example is definitely allowed, and most likely important to keep in.

class c($x) {
  test printf("%v", $x + $x)
}
include c(42)
include c("foo")
purpleidea commented 7 months ago
$y = $x + $x
$x = 42

The above is valid code, and you might recall the Ordering method of our AST. This basically figures out the DAG of what comes first, so that it always works even if you get the "logical ordering" wrong.

Additionally, there is a constant: RequireTopologicalOrdering which is currently false. If true, in the future, it would check if it was a valid toposort ordering or not, but I've left it unimplemented for now since I think it's more interesting and useful to not require code be in a logical order.

are the StmtBind.Unify() and StmtBind.Graph() calls done in the order $y, then $x, or in the order $x, then $y?

I might not entirely be understanding this question. If my above explanation doesn't suffice, are you just asking for the AST recurse/descent order?

gelisam commented 7 months ago

lambdas in variables can't be polymorphic

That's not what we said at all!

There are four places where variables are bound, and two places where non-variable names are bound:

  1. $x = ...
  2. func f($x) {...}
  3. func($x) {...}
  4. class c($x) {...}
  5. func x(...) {...}
  6. class x(...) {...}

On master, 1, 4, 5 and 6 are polymorphic (and thus 2 and 3 are monomorphic). We've agreed that 1 should be monomorphic, and that 5 and 6 should be polymorphic, but we forgot to discuss 2, 3, and 4.

You are now talking about lambdas in variables, but remember, we can't have a different rule depending on what type of value ends up in the variable. Maybe you are saying that all variables should be monomorphic? That is, 1, 2, 3, and 4 should be monomorphic, while 4 and 5 should be polymorphic?

gelisam commented 7 months ago

Let's write 5 tests covering those 5 cases

gelisam commented 7 months ago

If my above explanation doesn't suffice, are you just asking for the AST recurse/descent order?

I do remember that out-of-order definitions are allowed and that there is a function which calculates the topological order, but I don't remember what happens to that topological order. Does StmtProg use it to perform there recursive calls in that order rather than the order in which the definitions appear in the source code? Does it do so for StmtProg.SetScope(), StmtProg.Unify() and StmtProg.Graph(), or just for some of them?

purpleidea commented 7 months ago

Does StmtProg use it to perform there recursive calls in that order rather than the order in which the definitions appear in the source code?

Yes. Ordering() is only used once outside of the recursive ordering calls themselves-- it is used in StmtProg:SetScope. The other methods AFAICT don't need ordering. But if they do, we can change this.

purpleidea commented 7 months ago

You are now talking about lambdas in variables, but remember, we can't have a different rule depending on what type of value ends up in the variable. Maybe you are saying that all variables should be monomorphic? That is, 1, 2, 3, and 4 should be monomorphic, while 4 and 5 should be polymorphic?

To make sure I understand 100%, maybe you should write the tests or examples and I'll turn them into tests?

gelisam commented 7 months ago

I have just pushed a bunch of tests, looking good?

purpleidea commented 7 months ago

I have just pushed a bunch of tests, looking good?

All these tests look amazing!

I'd recommend for any of the ones that check a field, to use TestAstFunc3 instead of TestAstFunc2. All this means is you just move it one folder over. In addition, the test output will also check the resource fields (anotherstr => "what") stuff, and it will run the resource engine, so we get a slight bit more testing. IOW, any new tests that are still static, should probably go in TestAstFunc3.

gelisam commented 7 months ago

I don't understand what you mean by tests which are "still static". Do you mean tests which only exercise the type checker?

The rest of your comment makes me think that the tests which need to go to TestAstFunc3 are the ones which use anotherstr, but:

  1. I don't know why those tests use anotherstr => $x instead of test $x {}, I was just copying what your test-one-instance.txtar did.
  2. Your test-one-instance.txtar is in TestAstFunc2, not TestAstFunc3

So I am really confused as to what you want.

purpleidea commented 7 months ago

I don't understand what you mean by tests which are "still static". Do you mean tests which only exercise the type checker?

Sorry, what I meant was the TestAstFunc* tests can only test a single resource graph. So basically anything which would produce more than one resource graph would be dynamic. So if datetime.now() was a value in a resource, then we'd get a continuous stream.

I don't know why those tests use anotherstr => $x instead of test $x {}, I was just copying what your test-one-instance.txtar did. Your test-one-instance.txtar is in TestAstFunc2, not TestAstFunc3

Yeah it's fine for now, we'll fix it up later if we need to.

gelisam commented 7 months ago

I have now implemented every step of my plan, stripping away the assumption that ExprVars always point to a PolyExpr and replacing it with a new assumption that ExprVars never point to a PolyExpr. Alas, it is not enough.

Before, ExprVars could look up their target in obj.scope.Variables[obj.Name]. That target was a PolyExpr copy, and it was then the responsibility of ExprVar to call Graph() on this copy in order to obtain a Func which is unique to this use site.

Now, it is the responsibility of StmtBind to call Graph() on the one and only copy. The problem is: how does the ExprVar find the existing Func? It can only look up the Expr, not the Func!

Before the mcl-functions branch was merged, the answer to that question was easy, because there was only ever one Func for each Expr. Now that we have functions, this is no longer true: the Exprs in the body of a function correspond to multiple Funcs. But it is still true for StmtBind.

All right, I think I have a plan. StmtBind will wrap it Expr in a new SingletonExpr. The first time SingletonExpr.Graph() is called, it will store the resulting Func. Subsequent calls will return this Func.

purpleidea commented 7 months ago

All right, I think I have a plan.

\o/

PS: feel free to add private struct fields if needed like .expr as I previously did if that would help =D

gelisam commented 7 months ago

a bunch of examples now fail with could not unify types: 1 unconsumed generators, what does this mean?

purpleidea commented 7 months ago

a bunch of examples now fail with could not unify types: 1 unconsumed generators, what does this mean?

Type unification is failing... Why? Well a generator is one of the special invariants which can generate new invariants based on runtime information. For example, once an earlier stage of unification determines something is a string, that generate can learn that, and then generate a simple set of invariants...

So a generator is not being used up. Are we adding something to the set of things to unify which shouldn't get unified? Which examples should I look at? I can see if anything obvious sticks out.

gelisam commented 7 months ago

The simplest failing test is probably example.txtar

purpleidea commented 7 months ago

The simplest failing test is probably example.txtar

That code looks like:

test fmt.printf("answer: %d", 42) {}

So yeah, indeed that's simple =D

Something is amiss. Reminder, your debug function is still in the code base.

func DebugSolverState(solved map[interfaces.Expr]*types.Type, equalities []interfaces.Invariant) string

It should run, and help you know what's wrong with the AST perhaps? We can look at this together too if you want.

gelisam commented 7 months ago

DebugSolverState is still in the code base

I know, I already used it to get a starting state of

        0xc00064e798 any :: ?1
        0xc00064e7b0 any :: ?2
        0xc00051f4a0 func() { <built-in:printf> } :: ?3
        0xc00064e750 topLevel(func() { <built-in:printf> }) :: ?4
        0xc0006974c0 str("answer: %d") :: ?5
        0xc0006919d0 int(42) :: ?6
        0xc0007af420 call:fmt.printf(str("answer: %d"), int(42)) :: ?7
        ?5 = str
        ?6 = int
        ?1 = str
        ?2 = str
        ?1 = ?2
        ?4 = ?3
        ?4 = func(arg0 ?5, arg1 ?6) ?7
        gen(0xbfff60)

and a final state of

        int(42) :: int
        any :: str
        any :: str
        str("answer: %d") :: str
        0xc00051f4a0 func() { <built-in:printf> } :: ?1
        0xc00064e750 topLevel(func() { <built-in:printf> }) :: ?2
        0xc0007af420 call:fmt.printf(str("answer: %d"), int(42)) :: ?3
        ?2 = ?1
        ?2 = func(arg0 str, arg1 int) ?3
        gen(0xbfff60)
        ?3 = ?3

in both cases, we have call:fmt.printf(str("answer: %d"), int(42)) :: ?. why isn't printf telling the type checker that it returns a str?

purpleidea commented 7 months ago
    ?3 = ?3

This looks suspicious...

    0xc00064e750 topLevel(func() { <built-in:printf> }) :: ?2

This too...

purpleidea commented 7 months ago

in both cases, we have call:fmt.printf(str("answer: %d"), int(42)) :: ?. why isn't printf telling the type checker that it returns a str?

There could be an addition to printf Unify() that we're missing... I will look into it tonight, but I suspect something else.

purpleidea commented 7 months ago

It definitely knows that it returns a string:

https://github.com/purpleidea/mgmt/blob/8fcaa4abf22d90e913a69ea4a1979524b4de8444/lang/funcs/core/fmt/printf_func.go#L114

If you're convinced it's that, maybe the call code is missing some solver bit?

purpleidea commented 7 months ago

But as an aside, add an explicit type definition to your test temporarily to unblock yourself, and see if that makes everything work?

gelisam commented 7 months ago

0xc00064e750 topLevel(func() { <built-in:printf> }) :: ?2 is what I expect to see. I have added a new ExprTopLevel wrapper which captures the scope at the top-level. it is now wrapping all the built-ins, functions, and StmtBinds right-hand sides (RHS).

One benefit is that we no longer need to remember to call target.Graph(emptyEnv) instead of target.Graph(env), this is easier because all the recursive calls now look the same. Another reason is that RHSs used to be wrapped in ExprPoly, which was capturing the top-level scope and indicating to ExprVar that it should copy its target. We no longer want to copy the target, but we still need to capture the top-level scope, so I had to move the responsibility of capturing that scope out of ExprPoly and into this new ExprTopLevel.

gelisam commented 7 months ago

I have added an equality constraint between the ExprTopLevel and the printf it contains, that should be sufficient to propagate printf's type to the use site. Unless the type checker is matching on the Expr and doing something special if it's a built-in, and it can't do that anymore because the Expr is now an ExprTopLevel wrapping a built-in?

purpleidea commented 7 months ago

Unless the type checker is matching on the Expr and doing something special if it's a built-in, and it can't do that anymore because the Expr is now an ExprTopLevel wrapping a built-in?

There's no special casing for any different type of Expr... If the wrapping is an indirection, you can add an equality saying that Expr1 == Expr2, and that would fix it.

I'll dig down into the branch more, but in the meantime, I'd recommend just trying to get it work without unification. Once we know it works, then we are more confident we got the right AST shapes and so on, and then the type unification can fall into place =D

gelisam commented 7 months ago

in the meantime, I'd recommend just trying to get it work without unification

there are no variables in example.txtar. where can I put the type annotation? something like this?

import "fmt"
test (fmt.printf("answer: %d", 42) :: str) {}

Even if that was syntactically valid I feel like that's probably not going to solve the problem, because test already knows that it needs an str and printf already knows that it returns an str, so the type annotation is not adding any new information.

gelisam commented 7 months ago

If the wrapping is an indirection, you can add an equality saying that Expr1 == Expr2, and that would fix it.

I did that already: https://github.com/purpleidea/mgmt/pull/723/commits/cd334bbfe5922f4f527ecc3259a04798b673c057

gelisam commented 7 months ago

I think I know what is going on. printf expects to see a CallFuncArgsValue invariant where the Expr in the function position is exactly the printf node, and the TopLevel wrapper breaks this because the Expr in the function is now a TopLevel wrapping the printf node.

purpleidea commented 7 months ago

I think I know what is going on. printf expects to see a CallFuncArgsValue invariant where the Expr in the function position is exactly the printf node, and the TopLevel wrapper breaks this because the Expr in the function is now a TopLevel wrapping the printf node.

Oh!

I highly recommend we leave the printf unification code alone, HOWEVER, if it's appropriate, we can change the expr that CallFuncArgsValue lists. Maybe that's what you want instead? Since this top-level expr, is actually "the thing" now? -- That in and of itself raises an eyebrow... Does it have to be that way?

gelisam commented 7 months ago

I highly recommend we leave the printf unification code alone

Good news, I can't!

I tried modifying the printf unification code to pattern-match on the Expr in the function position to see if it is an ExprTopLevel, and if so to check if the wrapped Expr is the printf expr. But that would create a circular dependency between the module which defines printf and the module which defines ExprTopLevel.

gelisam commented 7 months ago

Since this top-level expr, is actually "the thing" now? -- That in and of itself raises an eyebrow... Does it have to be that way?

I don't understand what you are asking. Which thing?

gelisam commented 7 months ago

it's appropriate, we can change the expr that CallFuncArgsValue lists

it worked, thanks for the suggestion!!

purpleidea commented 7 months ago

it worked, thanks for the suggestion!!

\o/

You did the hard part figuring it out! I just know how some of the bits are connected.

It reminds me about something we've discussed previously, which seems true over and over again:

Our programming brains work in such different ways, it's a real pleasure working with you because either I can't solve something, or you can't, but together we can do anything! o/ \o

purpleidea commented 7 months ago

FYI: I added a few new commits to git master (not added in this branch) which among other things, add the start of an mcl and txtar formatting check. TL;DR: indents with spaces instead of tabs will fail the test.

The good news is, that your EDITOR might also see the improved .editorconfig file and automatically use tabs for you now.

gelisam commented 7 months ago

all right, rebasing on master then

gelisam commented 7 months ago

only 9 failing tests left!

purpleidea commented 7 months ago

\o/

gelisam commented 7 months ago

3 of them fail because they expect the type error to be 2 unconsumed generators but the actual error is 1 unconsumed generators.

  1. lookup3.txtar
  2. printfempty.txtar
  3. printfunificationerr0.txtar

Looking at the code, I don't understand why these tests expect two errors. Does it seem acceptable to change the expected error to be 1 unconsumed generators?