purpleidea / mgmt

Next generation distributed, event-driven, parallel config management!
https://purpleidea.com/tags/mgmtconfig/
GNU General Public License v3.0
3.67k stars 315 forks source link

add functions to the language #704

Closed gelisam closed 1 year ago

gelisam commented 1 year ago

This PR is not yet complete. It exists to make it easy to look at and comment on the code for the mcl-functions feature branch, which completely reworks the way in which functions work in MCL.

The problem

Today in master, builtins and lambdas use very different compilation strategies. A builtin is compiled to a single Func which receives values from upstream nodes and emits values downstream. A lambda, by contrast, is compiled away, as the lambda's body is inlined at every call site. Once this inlining is performed, only builtins remain, so the resulting function graph is the graph of their Funcs.

This does not work at all for MapFunc, a builtin which expects to receive a FuncValue as one of its inputs, because the above compilation strategy does not produce FuncValues, neither for builtins nor for lambdas. It would in fact be incorrect to construct a FuncValue from a builtin, because builtins can have an internal state and emit multiple values over time (e.g. System("seq 3") emits "1", then "2", then "3"), but a FuncValue can only transform input values into a single output value, so it cannot represent that complexity. And of course, lambdas can contain builtins (e.g. func(x) { System(x) }), so they cannot be represented by a FuncValue either.

The solution

The function graph must be able to change over time, because if func(x) { Map(System, xs) } first receives a list of length 1 and then a list of length 2, then at first there must be one SystemFunc in the function graph, then later there must be 2 SystemFuncs.

In order to do this, a FuncValue must be able to do more than transform input values into a single output value. Instead a FuncValue must describe in which way the function graph needs to change in order to accommodate a new instance of the function which the FuncValue represents. This branch's FuncValue does this by receiving input Funcs, creating and adding new nodes to the function graph, and then returning the one node among those new creations which shall be treated as the output Func, that is, the one whose output values are the result of applying the function represented by the FuncValue to the inputs received from the input nodes.

Another big change is that the function engine now works on a graph of Funcs instead of a graph of Exprs. This allows FuncValues to add Funcs to the function graph, without having to wrap them in Expr wrappers which don't correspond to any MCL code the user wrote.

Overview of the changes so far

  1. FuncValue now behaves as described above.
  2. A new type called SimpleFn behaves in the same way the old FuncValue did, because it is still useful in many places to represent a transformation from input values to a single output value.
  3. Txn, an abstract API for modifying a function graph transactionally.
  4. graphTxn, a thread-safe implementation of that API for modifying the function graph while the function engine is still running that graph.
  5. ReversibleTxn, a wrapper around Txn which can later remove the nodes and edges which were added. This is useful because if xs becomes a list of length 1 again, we must remove the now-disconnected SystemFunc node we created for the length 2 case.
  6. GraphTxn, a single-threaded implementation of that API which constructs a graph. Used to translate between the parts of the code which add nodes using this API and the parts which return a new Graph value.
  7. Reversible.AddGraph(), to translate in the opposite direction.
  8. Reimplemented CallFunc.Stream() under the new paradigm
  9. Combine Expr.Graph() and Expr.Func() into a single method, Expr.MergedGraph(). Since the function graph no longer receives a graph of Exprs, it cannot call Func() on those Exprs in order to obtain the Funcs it needs, instead the graph returned by Expr.Graph() must already be a graph of Funcs.
  10. MergedGraph() returns the Func in the graph whose output values correspond to the values emitted by the Expr. The caller can no longer assume that this node is the Expr itself, as the graph no longer contains Exprs.
  11. MergedGraph() receives an environment mapping variable names to Funcs. ExprVars can no longer lookup the Expr which defines the variable in the Scope, because they now need to connect to a Func, not an Expr.
  12. Implemented ExprCall.MergedGraph()
  13. Implemented ExprFunc.MergedGraph()
  14. Implemented ExprVar.MergedGraph()
  15. Removed FunctionFunc, as it is no longer used anywhere.
  16. Utility functions for converting between SimpleFn, FuncValue, and Funcs.
  17. Reimplement FuncValue.Call() under the new paradigm
  18. Reimplement MapFunc.Stream() under the new paradigm
  19. fill in MergedGraph()'s initial environment so it contains all the in-scope variables.
  20. Detect when a CallFunc or MapFunc's subgraph no longer emits any new values and the subgraph can no longer change (because the upstream node is no longer sending new FuncValues), so that the CallFunc can also indicate that it won't be sending new values downstream.
  21. Change the function engine to work with Funcs, not Exprs.
  22. Change all the MergedGraph() implementation so they add Funcs to the graph, not Exprs.
  23. Finish GraphTxn
  24. Plumb the table through Output()
  25. Add fancy testing infrastucture to validate that the new Lock/Unlock works in the function engine @purpleidea
  26. Copy top-level definitions when then are used by an ExprVar or an ExprCall.
  27. When Func. Stream() closes it's output channel, close the input channel of the downstream nodes, so that the tests finish instead of hanging. @purpleidea
  28. Use reference counting, so that transaction 0 can add vertex A, then transaction 1 also adds vertex A, then whether it is transaction 0 or 1 which reverses next, vertex A is not yet deleted, then when the other transaction reverses, vertex A is deleted. @purpleidea (Needs tests though!)
  29. @purpleidea to get rid of the interpret_test.go:1202: test #62: funcs: process end err: context canceled... messages. Those are red herrings since we're interrupting process and it's normal.
  30. Use a transaction to add the initial graph, and then reverse it at the end of the test, and assert that the graph is now empty. @purpleidea interpret_test.go:1546
  31. Allow a Func to have two edges pointing to the same Func? Or something equivalent? @purpleidea Note: the two edges are always going to be added (and later reversed) in the same transaction.
  32. improve the type unification solver so that the identity function applied to a string is accepted @purpleidea
  33. @gelisam make classes work again
  34. @purpleidea A panic inside of dage/process/etc should not HANG the engine. We should still unblock and exit cleanly. (I think this is fixed, but we don't have tests for it.)

Remaining work

  1. @gelisam To add some more tests for map and other things he thinks we should be catching.
  2. @purpleidea to make a txtar style test harness to model map function changing its inputs over time, or other things as well.
  3. Add input caching to prevent unnecessary churn to map. Decide on if we should do this or not. Probably yes.
  4. See if we can wrap the code in map to generalize it for writing new functions like reduce, filter, and any other lambda-taking funcs.
  5. @purpleidea should rewrite a lot of lib/main.go and and update the GAPI api as well.
  6. @gelisam How hard is it to pass a function to a resource (we just pass int/string/list/etc now) and what are the implications of this if any? Eg: Do they need to be entirely pure? Can they have side effects? Etc...
  7. @gelisam See how a compiler optimization would work so that non-lambda functions (normal ones) appear as simple static additions to the graph so that they don't require to Txn Commit/Reverse a bunch of stuff and unnecessarily hurt graph performance.

Given the number of changes, I would be surprised if all of that worked the first time, so in addition to the above, I expect a long debugging session to make all the tests pass.

purpleidea commented 1 year ago

Sweeet =D

purpleidea commented 1 year ago

FYI: I updated merged graph branch. You may wish to cherry-pick stuff into here. I can fix the merge conflict if necessary.

https://github.com/purpleidea/mgmt/tree/feat/merged-graph

Of note: I added caching of the function pointer returned from Func(). This way, if it's called more than once, the same func will be returned. Is this a problem for us? Of course if we Copy() the Func Expr, then the caching doesn't get copied...

If this approach is satisfactory, then this allows us to look at the functions from the function graph, and determine which ones maps to which by calling Func again everywhere.

This isn't as effective performance wise as adding a new map to our returned function signature, but at least for now, I think this is easier to plumb in, and once we're sure we don't need to change the signature any more, we can revisit in the future.

purpleidea commented 1 year ago

FYI: debugging test 42:

I set:

runGraphviz || true

to enable the function graph output.

Then I ran:

time go test github.com/purpleidea/mgmt/lang/ -run TestAstFunc2/test_#42

Which gets us the error:

    interpret_test.go:1605: test #42: stream errored: func `printf@0xc0006383f0` stopped before it was loaded

Which also produced /tmp/graphviz.dot.png

graphviz dot

Not the two constants are not attached to the function. So it never gets those inputs. Which is why we see the error.

The correct output looks roughly like this (from git master) graphviz dot

gelisam commented 1 year ago

The graph in which the two const nodes are not attached to anything looks correct to me. What is supposed to happen is that the printf node sends a FuncValue to the call node, which then creates a sub-graph which looks like the V-shaped graph from master, plus a ChannelBasedSink which sends the output of printf back to the call node via a channel.

Is it possible that the test harness stop with the error "stream errored: func @.***` stopped before it was loaded" before the call node is given a chance to generate it's sub-graph?

Another thing I notice about this example is that there are two very different printf nodes in this example, the one with in-degree zero which produces a FuncValue, and the one with in-degree 2 in the V-shaped sub-graph. So another possible explanation for the bug might be that we're using the wrong one in one of the two graphs.

On Fri, May 12, 2023, 18:25 James @.***> wrote:

FYI: debugging test 42:

I set:

runGraphviz || true

to enable the function graph output.

Then I ran:

time go test github.com/purpleidea/mgmt/lang/ -run TestAstFunc2/test_#42

Which gets us the error:

interpret_test.go:1605: test #42: stream errored: func ***@***.***` stopped before it was loaded

Which also produced /tmp/graphviz.dot.png

[image: graphviz dot] https://user-images.githubusercontent.com/135091/238090580-ece5d436-53b8-4b70-b55d-41c92ec2b0de.png

Not the two constants are not attached to the function. So it never gets those inputs. Which is why we see the error.

The correct output looks roughly like this (from git master) [image: graphviz dot] https://user-images.githubusercontent.com/135091/238090954-d45700fb-3838-4adf-a623-6864544392e5.png

— Reply to this email directly, view it on GitHub https://github.com/purpleidea/mgmt/pull/704#issuecomment-1546375099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL62D6LFKMFDZA2VTT2CLXF22GBANCNFSM6AAAAAAVUSX4KA . You are receiving this because you authored the thread.Message ID: @.***>

purpleidea commented 1 year ago

Sorry, I forgot to include the mcl code:

test fmt.printf("sq2: %d", 42)

Another thing I notice about this example is that there are two very different printf nodes in this example

Good point, I should have covered that sorry. Unfortunately, I don't think there are. The reason the nodes look slightly different is that in the earlier version, we're running a .String() to get the label on the Expr (which is what is in the function graph) where as in the newer version, we've had to simplify the label tag because we don't have as much context as the actual node is based on the function implementation (no more expr!). But the actual Stream() implementations in printf doing the work are identical.

Is it possible that the test harness stop with the error "stream errored: func @.***` stopped before it was loaded" before the call node is given a chance to generate it's sub-graph?

I don't believe so.

The reason we get this error is that the singular printf node which is running in the function engine receives no input (IOW, no incoming edges) so the engine closes the entire input stream. The printf function, seeing it is not getting any more inputs, then shuts down the output stream. Finally, the function engine notices (thanks engine!) that an output stream has shutdown without ever once producing a value, and assumes (rightly so) that there is a bug. All nodes are expected to either produce at least one value or error.

The current printf implementation indeed is supposed to receive one or more values (format string and optional args) in the function engine. If this is now supposed to go through a channelsink or channelsource, then the fundamental API of every static arg list function (basically everything except map/reduce/etc) would need to change. I don't think that's what we were trying to accomplish. I do believe we probably just forgot two edges in this case.

gelisam commented 1 year ago

the actual Stream() implementations in printf doing the work are identical.

Then that's the bug. The printf node with in-degree zero (let's call it printf-zero) should not be doing the same work as the printf node which in-degree two (let's call it printf-two). printf-zero should produce a FuncValue which creates a sub-graph containing printf-two, while printf-two should produce a string based on its arguments.

the function engine notices (thanks engine!) that an output stream has shutdown without ever once producing a value, and assumes (rightly so) that there is a bug.

That makes sense. The bug is that we give zero incoming edges to printf-two, which behaves like you explained when it has no incoming edges are missing. We should instead give zero incoming edges to printf-zero, which does produce a value when it has no incoming edges.

If this is now supposed to go through a channelsink or channelsource, then the fundamental API of every static arg list function (basically everything except map/reduce/etc) would need to change. I don't think that's what we were trying to accomplish.

We do want to change the semantics of every single MCL function, that's the premise of this PR: previously, every single MCL function was converting values to values, that doesn't work, so we now want every single MCL function to be generating sub-graphs instead.

Luckily, that doesn't mean that we have to change hundreds of function implementations throughout the codebase. We still need those Func implementations, we just need to make sure that the node which feeds the call node is not one of those Func implementations, and is instead a wrapper which returns a FuncValue which creates a sub-graph which contains one of those Func implementations.

gelisam commented 1 year ago

I have pushed an attempt at fixing the problem, but it doesn't work. The behaviour for our simplified test 42 is now non-deterministic :(

Sometimes it fails with stream errored: func 'subgraphOutput' stopped before it was loaded, as before but with subgraphOutput instead of printf.

Sometimes it fails with that message and then hangs.

Sometimes if fails with field arg1 XXX does not exist. That one makes sense, it's the same bug we encountered and fixed together during our last pairing session: I was not careful to use the correct edge names, because I didn't realize that the function graph used named parameters rather than positional parameters. In both places where I use the name arg%d XXX, there is a Type value nearby, that type should be a function type, and we should be able to extract the correct argument names from it.

purpleidea commented 1 year ago

The behaviour for our simplified test 42 is now non-deterministic :(

It's possible there is a race condition in either my engine or the subgraph channel stuff.

Function graph generation should be deterministic though.

Adding this line in TestAstFunc2:

runGraphviz || true

Then running the test

Will produce /tmp/graphviz.dot.png

if graphviz is installed, and can at least confirm the correct function graph shapes.

If we're satisfied with that shape, we can look into seeing where in the engine and/or call funcs the races are.

purpleidea commented 1 year ago

Btw:

I didn't realize that the function graph used named parameters rather than positional parameters

Indeed, since params come in as edges, there's no obvious ordering-- the only solution is to have them be named.

In both places where I use the name arg%d XXX, there is a Type value nearby

What do you mean by nearby?

gelisam commented 1 year ago

What do you mean by nearby?

The T field of the enclosing FuncValue definition.

gelisam commented 1 year ago

Adding this line in TestAstFunc2

Where? TestAstFunc2 is a folder, not a file.

gelisam commented 1 year ago

ah, it's also a function in ./lang/interpret_test.go

gelisam commented 1 year ago

If we're satisfied with that shape, we can look into seeing where in the engine and/or call funcs the races are.

field-arg0-does-not-exist

(I see the same graph regardless of which of the non-deterministic failures occurs)

That looks correct. The const going into call is the ConstFunc which returns a FuncValue and closes the stream, and the other two const nodes are the ConstFunc are the ConstFuncs for "sq2: %d" and 42. Is there a way to see the function graph after the sub-graph is created?

gelisam commented 1 year ago

Oh! I bet the arg%d XXX is the root cause. When printf-two receives two mis-labelled inputs, it errors out, closing its output stream before emitting any value. The non-determinism is merely about which of the two panics is displayed first. Not sure about the hang though...

purpleidea commented 1 year ago

Oh! I bet the arg%d XXX is the root cause. When printf-two receives two mis-labelled inputs...

If you instead run:

test fmt.printf("onlyonearg")

Does it still display that non-determinism?

purpleidea commented 1 year ago

I decided I wanted to double check I hadn't made a mistake somewhere in the deps for all of this (engine, txn, etc) so I wrote a test. Indeed I found a small bug (woops) and while I don't think it necessarily caused this issue, I figured I'd mention it in case.

The bug and associated test is in the txn code. I rebased it and feat/merge-stuff contains the fix. I also rebased a copy of this branch (which is rebased on top of that merge-stuff branch) to save you doing the work if you'd like. That branch is feat/sam-squashed-rebased.

purpleidea commented 1 year ago

I tried this branch of yours (with the txn fix, and just one arg, the format string, for the printf) and I get:

    interpret_test.go:1234: test #42: funcs: Running func `subgraphOutput`
    interpret_test.go:1234: test #42: funcs: not yet loaded: call
    interpret_test.go:1234: test #42: funcs: Running func `printf@0xc000279980`
    interpret_test.go:1604: test #42: FAIL
    interpret_test.go:1605: test #42: stream errored: func `subgraphOutput` stopped before it was loaded
    interpret_test.go:1234: test #42: funcs: Exiting func `printf@0xc000279980`
    interpret_test.go:1234: test #42: funcs: Exiting func `subgraphOutput`

Meaning I think there might be a channel bug in the func code.

I also notice that we hang at:

obj.wg.Wait() // don't cleanup these before Run() finished

In the engine. This leads me to believe there is a channel bug in your function code. I can help debug that with you.

BUT to be 100% sure it's not my fault (it might be!) I'm working on a real test for the engine stuff so I can be sure it's not the engine's fault. (I don't think so, but it's the most productive thing I can think to do on my own.)

purpleidea commented 1 year ago

Is there a way to see the function graph after the sub-graph is created?

I can add some debugging as part of the Txn Commit() to print the graph and run it through graphviz if you like.

purpleidea commented 1 year ago

I wrote the debugging patch for you. It is here: ea808dbcaf2a9b9c702d0052fe20909f5f804d2f You can cherry-pick it. It produces this output (with one arg): txn-graphviz-1684190515 dot

With two args: txn-graphviz-1684190679 dot

gelisam commented 1 year ago

If you instead run:

test fmt.printf("onlyonearg")

Does it still display that non-determinism?

Yes.

I also rebased a copy of this branch (which is rebased on top of that merge-stuff branch) to save you doing the work if you'd like. That branch is feat/sam-squashed-rebased.

Ok, I have now force-pushed feat/sam-squashed-rebased over mcl-functions.

gelisam commented 1 year ago

I wrote the debugging patch for you. It is here: https://github.com/purpleidea/mgmt/commit/ea808dbcaf2a9b9c702d0052fe20909f5f804d2f You can cherry-pick it. It produces this output [...]

Looks good, except for the arg%d XXX. Excellent, at least everything runs fine until the sub-graph is created!

gelisam commented 1 year ago

This leads me to believe there is a channel bug in your function code.

I confirm that ChannelBasedSinkFunc::Stream() does close its output channel without emitting any value downstream, but that's intentional:

func (obj *ChannelBasedSinkFunc) Stream() error {
    defer close(obj.Chan)  // the sender closes
    close(obj.init.Output) // we will never send any value downstream

    [...]
}

The subgraphOutput is a node with out-degree zero which only exists for its side-effects, namely forwarding the input values to the call node so that the call node can emit those values downstream. If you want, ChannelBasedSinkFunc could emit a dummy value before closing the channel, or it could even emit all its input values downstream as it receives them; it doesn't matter, as nobody is ever going to read those values.

purpleidea commented 1 year ago

The subgraphOutput is a node with out-degree zero which only exists for its side-effects, namely forwarding the input values to the call node so that the call node can emit those values downstream. If you want, ChannelBasedSinkFunc could emit a dummy value before closing the channel, or it could even emit all its input values downstream as it receives them; it doesn't matter, as nobody is ever going to read those values.

If it doesn't have any outdegree nodes that are part of the function graph (not the back channels), there's no where to send it's values, so it doesn't matter.

We should talk about this more to make sure we both have the same assumptions about how the function engine should be running though.

I don't know if it's intentional and we're missing an optimization, or if it's a bug, but for the simple case like: test fmt.printf("hey") {} then I would expect to not need the fancy callfunc misdirection stuff... But that we can discuss IRL.

I can keep working on engine test stuff unless there's something you can recommend I do instead to unblock you somehow.

gelisam commented 1 year ago

I don't think I'll have time to write any code until we meet IRL; if you want, you could tweak the ChannelBasedSink to send a dummy value downstream, or change the arg%d XXX names to match the argument names in the nearby Type value.

On Tue, May 16, 2023, 15:08 James @.***> wrote:

The subgraphOutput is a node with out-degree zero which only exists for its side-effects, namely forwarding the input values to the call node so that the call node can emit those values downstream. If you want, ChannelBasedSinkFunc could emit a dummy value before closing the channel, or it could even emit all its input values downstream as it receives them; it doesn't matter, as nobody is ever going to read those values.

If it doesn't have any outdegree nodes that are part of the function graph (not the back channels), there's no where to send it's values, so it doesn't matter.

We should talk about this more to make sure we both have the same assumptions about how the function engine should be running though.

I don't know if it's intentional and we're missing an optimization, or if it's a bug, but for the simple case like: test fmt.printf("hey") {} then I would expect to not need the fancy callfunc misdirection stuff... But that we can discuss IRL.

I can keep working on engine test stuff unless there's something you can recommend I do instead to unblock you somehow.

— Reply to this email directly, view it on GitHub https://github.com/purpleidea/mgmt/pull/704#issuecomment-1550212268, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL62D2JMRFZICLNIQM62DXGPGELANCNFSM6AAAAAAVUSX4KA . You are receiving this because you modified the open/close state.Message ID: @.***>

gelisam commented 1 year ago

Turns out I did have time to tweak the ChannelBasedSink after all!

– Samuel

On Tue, May 16, 2023 at 6:44 PM Samuel Gélineau @.***> wrote:

I don't think I'll have time to write any code until we meet IRL; if you want, you could tweak the ChannelBasedSink to send a dummy value downstream, or change the arg%d XXX names to match the argument names in the nearby Type value.

On Tue, May 16, 2023, 15:08 James @.***> wrote:

The subgraphOutput is a node with out-degree zero which only exists for its side-effects, namely forwarding the input values to the call node so that the call node can emit those values downstream. If you want, ChannelBasedSinkFunc could emit a dummy value before closing the channel, or it could even emit all its input values downstream as it receives them; it doesn't matter, as nobody is ever going to read those values.

If it doesn't have any outdegree nodes that are part of the function graph (not the back channels), there's no where to send it's values, so it doesn't matter.

We should talk about this more to make sure we both have the same assumptions about how the function engine should be running though.

I don't know if it's intentional and we're missing an optimization, or if it's a bug, but for the simple case like: test fmt.printf("hey") {} then I would expect to not need the fancy callfunc misdirection stuff... But that we can discuss IRL.

I can keep working on engine test stuff unless there's something you can recommend I do instead to unblock you somehow.

— Reply to this email directly, view it on GitHub https://github.com/purpleidea/mgmt/pull/704#issuecomment-1550212268, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL62D2JMRFZICLNIQM62DXGPGELANCNFSM6AAAAAAVUSX4KA . You are receiving this because you modified the open/close state.Message ID: @.***>

purpleidea commented 1 year ago

FYI: With your latest patch I now get:

    interpret_test.go:1234: test #42: stream...
    interpret_test.go:1234: test #42: funcs: Running func `call`
    interpret_test.go:1234: test #42: funcs: Running func `const`
    interpret_test.go:1234: test #42: funcs: Exiting func `const`
    interpret_test.go:1234: test #42: funcs: Running func `const`
    interpret_test.go:1234: test #42: funcs: Exiting func `const`
    interpret_test.go:1234: test #42: funcs: func `const` started
    interpret_test.go:1234: test #42: funcs: func `const` started
    interpret_test.go:1234: test #42: funcs: Running func `const`
    interpret_test.go:1234: test #42: funcs: Exiting func `const`
    interpret_test.go:1234: test #42: funcs: func `const` started
    interpret_test.go:1234: test #42: funcs: not yet loaded: call
    interpret_test.go:1234: test #42: funcs: not yet loaded: call
    interpret_test.go:1234: test #42: funcs: pausing...
    interpret_test.go:1234: test #42: funcs: paused!
    interpret_test.go:1234: test #42: funcs: resuming...
    interpret_test.go:1234: test #42: funcs: resumed!
    interpret_test.go:1234: test #42: funcs: Running func `subgraphOutput`
    interpret_test.go:1234: test #42: funcs: Running func `printf@0xc0007507e0`

    interpret_test.go:1629: test #42: FAIL
    interpret_test.go:1630: test #42: interpret failed with: missing value in table
    interpret_test.go:1234: test #42: funcs: Exiting func `printf@0xc0007507e0`

I haven't debugged it yet.

purpleidea commented 1 year ago

FYI: I've written some more tests for the txn stuff, and I am now fairly confident that there are no bugs in the txn code.

It's also easy to add new sequences of addvertex/addedge/commit/reverse operations and test it does what you expected.

Those patches are now rebased into the feature branch:

feat/sam-squashed-rebased

and there is the same one with the graphviz patch on top at:

feat/sam-squashed-rebased-graphviz

If you're curious, you can see one of these tests here: https://github.com/purpleidea/mgmt/blob/58e08b6c4bd90d93eb23466affcfdedc529bfee7/lang/funcs/dage/txn_test.go#L340

You basically just give it a list of txn operations and it runs them in order alongside any checks that you want as well.

Now onto more complex engine tests for me!

gelisam commented 1 year ago

All right, so yesterday we've managed to successfully call a builtin function, and a lambda with no arguments. Here's my attempt at explaining the current difficulty with calling lambdas which do have arguments.

$id = func($x) {$x}
$foo = $id("foo")
$bar = $id(42)

When type-checking, the $id = ... definition is ignored. Instead, a copy is made at the $id(...) call site, and it is only that version which is type-checked. This makes it possible for $id to have type func(x string) string at one call site and type func(x int) int at the other call site, without having to give the definition site a polymorphic type.

Then, when constructing the function graph, the $id = ... definition is compiled to a const node which emits a FuncValue, and that node is added to the runtime environment. Because this definition has not been type-checked, the FuncValue's type is nil.

Next, at each call site,the ExprCall.Graph(env) attaches the const node from the runtime environment to a new CallFunc node. This is a problem, because AddEdge calls AddVertex, which errors out when it sees the nil type` (while trying to validate that this type is a function type).

gelisam commented 1 year ago

Easy solution which doesn't work number 1: also type-check the definition site.

This doesn't work because MCL doesn't support polymorphic types like forall a. a -> a, so the type of func($x) {$x} is ambiguous. Thus, if we try to fix the problem by type-checking the definition site, the program will fail to typecheck instead of crashing. Better, but not what we want.

gelisam commented 1 year ago

Easy solution which doesn't work number 2: call the type-checked copy, the ExprCall.expr field.

This would work in this particular example, and in fact that's what we do on master. So let's review why this branch uses an environment.

On master, we know exactly how many times each function is called and with which arguments. We know that information statically, before type-checking. So we can copy ExprCall that many times and have the ExprCall.expr field be correct for each copy, and then type-check each.

On this branch, we want to support functions like map which call other functions a different number of times depending on some runtime values. So the information about which function call receives which argument is no longer known statically, it is now known dynamically, after type-checking. Scope and ExprCall.exprare both filled before type-checking, so they can only hold the statically-known information. That's why we need a new place in which we can store the dynamically-known information: the environment.

So even though in this example, it would be possible to use ExprCall.expr, if we want to progress towards supporting map (as opposed to progressing from the current state of the branch back towards master), we must find a way to call the function from the environment, not the function from ExprCall.expr.

purpleidea commented 1 year ago

we must find a way to call the function from the environment, not the function from ExprCall.expr.

In case this is a helpful insight: is it helpful to use ExprCall.expr merely to determine the type (during type checking) and then to copy that determined type into the N different copies of the actual function that we'll need? I suspect the N copies will have the same type as the singular ExprCall.expr that we have copied around. And in this case then we won't need a second stage of Unify().

HTH

gelisam commented 1 year ago

Yes, that seems sensible: ExprCall.expr does have the correct type for each call, and the function from the environment does have the correct behavior for each call. Copying the type thus seems like it should give us the best of both worlds!

But how do we accomplish this in practice? It would be easy if we had one {V: wrongValue, T: correctType} struct and one {V: correctValue, T: nil} struct. In reality, we have a correctType and a Func which emits a FuncValue. Probably a ConstFunc. Okay, so we can cast the Func to a ConstFunc, then cast the Value it holds to a FuncValue, then set its T.

That seems quite fragile. What if the Func comes from if ($now % 2 == 0) then $f else $g? Do we need to add Func.SetType(Type) to the Func interface, so that IfFunc can forward the SetFunc request to its two branches?

And what if the Func has already been added to the graph earlier? For example, the $id("foo") call would set $id's type to func(string) string and add it to the graph, and then $id(42) would overwrite $id's type to be func(int) int?

I think it might be wrong for runtime functions to even have a type. Would it perhaps be sufficient to have the Sig.Ord, so we can generate edges with the right names?

gelisam commented 1 year ago

Or maybe FuncValue.call() should take an extra Type argument?

purpleidea commented 1 year ago

That seems quite fragile. What if the Func comes from if ($now % 2 == 0) then $f else $g? Do we need to add Func.SetType(Type) to the Func interface, so that IfFunc can forward the SetFunc request to its two branches?

There are two types of functions (type wise) in mgmt:

1) Any normal function which has a static type. For example: strings.ToLower is a func(str) str.

2) A function which could take on between 1 and infinite numbers of types. Those are represented by one of the polymorphisms variants. All of those have a .Build(typ) method.

So all we need to do is run: Build(*types.Type) error // then, you can get argNames from Info()

(See lang/interfaces/func.go @ PolyFunc interface.

Tada!

If it's a Func instead of a PolyFunc you can do:

if polyFunc, ok := someFunc.(interfaces.PolyFunc); ok {
    err := polyFunc.Build(typ) // XXX: remember to check for error! Some types might not fit.
    if err != nil { ... }
} else {
    // the type is already known statically!
}

Does this help?

purpleidea commented 1 year ago

IfFunc can forward the SetFunc request to its two branches?

I didn't completely understand this question. We should after unification know the type for everything... No need to recurse it through... That happens in the AST already automatically, doesn't it? Or did you mean something different?

purpleidea commented 1 year ago

And what if the Func has already been added to the graph earlier? For example, the $id("foo") call would set $id's type to func(string) string and add it to the graph, and then $id(42) would overwrite $id's type to be func(int) int?

They each work on their own copy both for type unification, and then runtime... So it's safe. There should be no conflict, right? Remember we never Unify OR use the original ExprFunc definition site.

gelisam commented 1 year ago

the type is already known statically! [...] We should after unification know the type for everything...

Remember that the problem we are trying to solve is that we have a Func in the environment for which type-checking did not run, and intentionally so, because func($x) {$x} has an ambiguous type in MCL. So it is not the case that we already statically know the type for all Funcs.

Remember we never Unify OR use the original ExprFunc definition site.

In this branch, we do use the original ExprFunc definition site, it's the Func we put in the environment!

gelisam commented 1 year ago

Easy solution which doesn't work number 3: only keep Sig.Ord at runtime.

The idea there is that maybe the Func from the environment is fine as-is, it's AddVertex which panics unnecessarily. After all, Haskell's types are erased at runtime, so why can't MCL's?

In order for this approach to work, we need to make sure types are not needed at runtime. I know that the Sig.Ord part of the type is used in order to label the incoming edges (instead of the labels arg%d XXX), because we fixed that recently. Anything else?

Alas! A polyfunc like + does need to know its type in order resolve to the correct implementation.

gelisam commented 1 year ago

Oh! I think I know what the problem is: MCL functions are not functions!! They are C++-style templates for functions. They need to be instantiated at a certain number of concrete types, those which are used in the program, and then those instantiations can be used as functions.

func($x) {$x + $x} doesn't have a type. It does not have a runtime representation. I can't store it in the environment. func(str $x) {$x + $x} does have a type, it does have a runtime representation, and I can store it in the environment.

gelisam commented 1 year ago

So, in $id("foo"), the template named $id is instantiated (copied), which is how it can have a distinct type from $id(42). The next step is to figure out exactly where templates are instantiated. We previously agreed that in map(..., [$id, ...]), $id must have a monomorphic type like func(string) string, so it must be instantiated at that point. Is that also the case for [$id, ...]? What about map($id, ...)? What about if ... then $id else...?

purpleidea commented 1 year ago

The next step is to figure out exactly where templates are instantiated.

Currently this happens in SetScope. I hope that doesn't need to be changed, but if it's the only way, then we change. It does this so that the next step (Unify) has all the information to build all the missing types.

Unify() only works on objects it has been told about. So for example if you give it obj1, obj2, obj3 and some constraints, eg type(obj1) == type(obj2) and obj3 == str, then it will work it all out for obj1,2,3. If you never tell it about obj4, then it will never need to figure out that type.

So let's say obj1 is ExprCall.expr, and so we solve for it's type. There might be N different functions inside map with the same type. Call them objM1,2,3...n. We don't need to tell Unify about those functions. All that matters is that during function generation we use the information gleamed in Unify to set those types before the function engine runs.

HTH

gelisam commented 1 year ago

I don't mean where in the golang code, I mean where in the MCL code. Currently we only copy the callee to ExprCall.expr at call sites, but that's clearly not enough since map(..., [$id, ...]) does instantiate $id to a concrete type, and yet we do not copy $id. If we don't copy $id there, then the following will be a type error:

$id = func($x) {$x}
$strs = map(func($f) {$f("foo")}, [$id])
$ints = map(func($f) {$f(42)}, [$id])
purpleidea commented 1 year ago

I've rebased into some new changes. These are available as feature branch: feat/sam-squashed-rebased This should match this current PR but with my new changes incorporated. The changes are as follows:

I have some test cases that show it works. I have not merged those yet because I am still building out the test cases so that I have a chance of finding more bugs (if any).

At least we're hopefully unblocked on the engine front now. Thanks for working on the algorithmic function figuring!

purpleidea commented 1 year ago

My new tests have been added. This is rebased again and still available at: feat/sam-squashed-rebased We can add more tests if we find a scenario which breaks the engine. I'm feeling pretty optimistic about it though.

purpleidea commented 1 year ago

I've updated feat/sam-squashed-rebased again. This time it includes a long-delayed port from the old function interface that included:

Stream() error
Close() error

And instead replaces those two with:

Stream(context.Context) error

Which allows you to call the close directly through the context. I also added a patch to update the channel source and sink functions.

gelisam commented 1 year ago

Here is a hopefully-simpler way to understand the problem.

Part1: why we copy functions

MCL supports functions like + and func($x) {$x} which can have more than one type. Those functions need to be specialized to a single type before their Graph() method is called, otherwise the function graph engine will complain that a Func node does not have a type.

On master, this is done by copying the function in ExprCall. That copy is only used at one type, which the type-checker finds; done.

In this branch, we want to support map. In map($f, [...]), the calls to $f are made at runtime, so it is too late to copy $f at every call site like ExprCall does. However, those calls are all made with the same type, so it suffices to copy $f in the ExprCall node for map($f, [...]).

This means that ExprCall must also copy its arguments, not just the function!

Part 2: why the environment is broken

Looking up values in the environment by name implies that there is only one value with that name. But if printf("%s %d", $f("foo"), $f(42)) makes two copies of $f, there are now at least two values named $f at runtime. 3, if we count $f's definitions site. Our current bug is due to the fact that when we lookup $f in the environment, we find the definition site's version of $f, when what we want is for the first call to find the first copy, and for the second call to find the second copy.

This means that when we copy, we must rename all the copied variables in order to make them unique!

purpleidea commented 1 year ago

This means that when we copy, we must rename all the copied variables in order to make them unique!

\o/ sounds like we have a solution! This feels like it's firmly in your patch territory, but I can pair with you on it if you want, or alternatively, if you have any menial tasks for me, let me know what to do =D

gelisam commented 1 year ago

I think I have found an easier way to implement this than to rename the variables when copying nodes.

In this branch, there are currently two types of variables in the environment: the variables which come from the parameters of a function, and the variables which refer to top-level definitions. I now think it was a mistake to put both in the same environment.

On master, no environment is needed because there is a one-to-one correspondence between Exprs and Funcs. On this branch, an environment is needed because the Exprs inside lambda bodies correspond to multiple Funcs. But the Exprs at the top-level do correspond to a single Func!

Another difference between top-level variables and function parameters is that it is only the top-level variables which can have more than one type and thus need to be copied. After all, the arguments are copied so that a single type can be given to each argument before the call is executed. Thus, while the call is running, each parameter is already bound to a Func with a single type.

This suggests a simple solution:

  1. ExprCall should copy the functions and arguments which come from the top-level, but it should not copy the functions and arguments which come from lambda parameters
  2. ExprVar.Graph() should call the Graph() method of the top-level copy to which it refers, if that's what it refers to, and it should lookup the variable name in the environment if the variable refers to a lambda parameter.

In order to make this plan more concrete, can you tell me

  1. Where is the ExprCall.expr copy made?
  2. How does ExprVar knows what the variable refers to?
purpleidea commented 1 year ago

In this branch, there are currently two types of variables in the environment: the variables which come from the parameters of a function, and the variables which refer to top-level definitions. I now think it was a mistake to put both in the same environment.

You might remember I have my scope type here: https://github.com/purpleidea/mgmt/blob/446dbde836c10159db0370c9eacf97aa3c4858a2/lang/interfaces/ast.go#L214

Notably it has separate buckets for variables, functions, and classes, as well as "indexes" which had to do with function arguments and nested functions and so on. It never worked 100% but it got close. You might remember we used it a bit as well. The "chain" btw was for detecting recursion.

If this is helpful, please let me know... It's feeling more and more like your "environment". Reminder, at the moment it is the input argument for SetScope: SetScope(scope *interfaces.Scope) error.

Where is the ExprCall.expr copy made?

In SetScope. Here for example: https://github.com/purpleidea/mgmt/blob/446dbde836c10159db0370c9eacf97aa3c4858a2/lang/ast/structs.go#L7603

Note it is not always created via a copy if it's meant to have the same reference, eg: https://github.com/purpleidea/mgmt/blob/446dbde836c10159db0370c9eacf97aa3c4858a2/lang/ast/structs.go#L7588

How does VarExpr knows what the variable refers to?

ExprVar is pulled out of the "scope" when it's first needed. That currently happens in Unify and Graph for example. See here: https://github.com/purpleidea/mgmt/blob/446dbde836c10159db0370c9eacf97aa3c4858a2/lang/ast/structs.go#L8469 and https://github.com/purpleidea/mgmt/blob/446dbde836c10159db0370c9eacf97aa3c4858a2/lang/ast/structs.go#L8524

HTH

gelisam commented 1 year ago

And what does the scope point to when the ExprVar points to a lambda parameter?