Closed gelisam closed 1 year ago
Sweeet =D
FYI: I updated merged graph branch. You may wish to cherry-pick stuff into here. I can fix the merge conflict if necessary.
https://github.com/purpleidea/mgmt/tree/feat/merged-graph
Of note: I added caching of the function pointer returned from Func(). This way, if it's called more than once, the same func will be returned. Is this a problem for us? Of course if we Copy() the Func Expr, then the caching doesn't get copied...
If this approach is satisfactory, then this allows us to look at the functions from the function graph, and determine which ones maps to which by calling Func again everywhere.
This isn't as effective performance wise as adding a new map to our returned function signature, but at least for now, I think this is easier to plumb in, and once we're sure we don't need to change the signature any more, we can revisit in the future.
FYI: debugging test 42:
I set:
runGraphviz || true
to enable the function graph output.
Then I ran:
time go test github.com/purpleidea/mgmt/lang/ -run TestAstFunc2/test_#42
Which gets us the error:
interpret_test.go:1605: test #42: stream errored: func `printf@0xc0006383f0` stopped before it was loaded
Which also produced /tmp/graphviz.dot.png
Not the two constants are not attached to the function. So it never gets those inputs. Which is why we see the error.
The correct output looks roughly like this (from git master)
The graph in which the two const nodes are not attached to anything looks correct to me. What is supposed to happen is that the printf node sends a FuncValue to the call node, which then creates a sub-graph which looks like the V-shaped graph from master, plus a ChannelBasedSink which sends the output of printf back to the call node via a channel.
Is it possible that the test harness stop with the error "stream errored: func @.***` stopped before it was loaded" before the call node is given a chance to generate it's sub-graph?
Another thing I notice about this example is that there are two very different printf nodes in this example, the one with in-degree zero which produces a FuncValue, and the one with in-degree 2 in the V-shaped sub-graph. So another possible explanation for the bug might be that we're using the wrong one in one of the two graphs.
On Fri, May 12, 2023, 18:25 James @.***> wrote:
FYI: debugging test 42:
I set:
runGraphviz || true
to enable the function graph output.
Then I ran:
time go test github.com/purpleidea/mgmt/lang/ -run TestAstFunc2/test_#42
Which gets us the error:
interpret_test.go:1605: test #42: stream errored: func ***@***.***` stopped before it was loaded
Which also produced /tmp/graphviz.dot.png
[image: graphviz dot] https://user-images.githubusercontent.com/135091/238090580-ece5d436-53b8-4b70-b55d-41c92ec2b0de.png
Not the two constants are not attached to the function. So it never gets those inputs. Which is why we see the error.
The correct output looks roughly like this (from git master) [image: graphviz dot] https://user-images.githubusercontent.com/135091/238090954-d45700fb-3838-4adf-a623-6864544392e5.png
— Reply to this email directly, view it on GitHub https://github.com/purpleidea/mgmt/pull/704#issuecomment-1546375099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL62D6LFKMFDZA2VTT2CLXF22GBANCNFSM6AAAAAAVUSX4KA . You are receiving this because you authored the thread.Message ID: @.***>
Sorry, I forgot to include the mcl code:
test fmt.printf("sq2: %d", 42)
Another thing I notice about this example is that there are two very different printf nodes in this example
Good point, I should have covered that sorry. Unfortunately, I don't think there are. The reason the nodes look slightly different is that in the earlier version, we're running a .String() to get the label on the Expr (which is what is in the function graph) where as in the newer version, we've had to simplify the label tag because we don't have as much context as the actual node is based on the function implementation (no more expr!). But the actual Stream() implementations in printf doing the work are identical.
Is it possible that the test harness stop with the error "stream errored: func @.***` stopped before it was loaded" before the call node is given a chance to generate it's sub-graph?
I don't believe so.
The reason we get this error is that the singular printf node which is running in the function engine receives no input (IOW, no incoming edges) so the engine closes the entire input stream. The printf function, seeing it is not getting any more inputs, then shuts down the output stream. Finally, the function engine notices (thanks engine!) that an output stream has shutdown without ever once producing a value, and assumes (rightly so) that there is a bug. All nodes are expected to either produce at least one value or error.
The current printf implementation indeed is supposed to receive one or more values (format string and optional args) in the function engine. If this is now supposed to go through a channelsink or channelsource, then the fundamental API of every static arg list function (basically everything except map/reduce/etc) would need to change. I don't think that's what we were trying to accomplish. I do believe we probably just forgot two edges in this case.
the actual Stream() implementations in printf doing the work are identical.
Then that's the bug. The printf node with in-degree zero (let's call it printf-zero) should not be doing the same work as the printf node which in-degree two (let's call it printf-two). printf-zero should produce a FuncValue which creates a sub-graph containing printf-two, while printf-two should produce a string based on its arguments.
the function engine notices (thanks engine!) that an output stream has shutdown without ever once producing a value, and assumes (rightly so) that there is a bug.
That makes sense. The bug is that we give zero incoming edges to printf-two, which behaves like you explained when it has no incoming edges are missing. We should instead give zero incoming edges to printf-zero, which does produce a value when it has no incoming edges.
If this is now supposed to go through a channelsink or channelsource, then the fundamental API of every static arg list function (basically everything except map/reduce/etc) would need to change. I don't think that's what we were trying to accomplish.
We do want to change the semantics of every single MCL function, that's the premise of this PR: previously, every single MCL function was converting values to values, that doesn't work, so we now want every single MCL function to be generating sub-graphs instead.
Luckily, that doesn't mean that we have to change hundreds of function implementations throughout the codebase. We still need those Func implementations, we just need to make sure that the node which feeds the call node is not one of those Func implementations, and is instead a wrapper which returns a FuncValue which creates a sub-graph which contains one of those Func implementations.
I have pushed an attempt at fixing the problem, but it doesn't work. The behaviour for our simplified test 42 is now non-deterministic :(
Sometimes it fails with stream errored: func 'subgraphOutput' stopped before it was loaded
, as before but with subgraphOutput
instead of printf
.
Sometimes it fails with that message and then hangs.
Sometimes if fails with field arg1 XXX does not exist
. That one makes sense, it's the same bug we encountered and fixed together during our last pairing session: I was not careful to use the correct edge names, because I didn't realize that the function graph used named parameters rather than positional parameters. In both places where I use the name arg%d XXX
, there is a Type
value nearby, that type should be a function type, and we should be able to extract the correct argument names from it.
The behaviour for our simplified test 42 is now non-deterministic :(
It's possible there is a race condition in either my engine or the subgraph channel stuff.
Function graph generation should be deterministic though.
Adding this line in TestAstFunc2:
runGraphviz || true
Then running the test
Will produce /tmp/graphviz.dot.png
if graphviz is installed, and can at least confirm the correct function graph shapes.
If we're satisfied with that shape, we can look into seeing where in the engine and/or call funcs the races are.
Btw:
I didn't realize that the function graph used named parameters rather than positional parameters
Indeed, since params come in as edges, there's no obvious ordering-- the only solution is to have them be named.
In both places where I use the name arg%d XXX, there is a Type value nearby
What do you mean by nearby?
What do you mean by nearby?
The T field of the enclosing FuncValue definition.
Adding this line in TestAstFunc2
Where? TestAstFunc2 is a folder, not a file.
ah, it's also a function in ./lang/interpret_test.go
If we're satisfied with that shape, we can look into seeing where in the engine and/or call funcs the races are.
(I see the same graph regardless of which of the non-deterministic failures occurs)
That looks correct. The const
going into call
is the ConstFunc
which returns a FuncValue
and closes the stream, and the other two const
nodes are the ConstFunc
are the ConstFunc
s for "sq2: %d"
and 42
. Is there a way to see the function graph after the sub-graph is created?
Oh! I bet the arg%d XXX
is the root cause. When printf-two receives two mis-labelled inputs, it errors out, closing its output stream before emitting any value. The non-determinism is merely about which of the two panics is displayed first. Not sure about the hang though...
Oh! I bet the arg%d XXX is the root cause. When printf-two receives two mis-labelled inputs...
If you instead run:
test fmt.printf("onlyonearg")
Does it still display that non-determinism?
I decided I wanted to double check I hadn't made a mistake somewhere in the deps for all of this (engine, txn, etc) so I wrote a test. Indeed I found a small bug (woops) and while I don't think it necessarily caused this issue, I figured I'd mention it in case.
The bug and associated test is in the txn code. I rebased it and feat/merge-stuff contains the fix. I also rebased a copy of this branch (which is rebased on top of that merge-stuff branch) to save you doing the work if you'd like. That branch is feat/sam-squashed-rebased.
I tried this branch of yours (with the txn fix, and just one arg, the format string, for the printf) and I get:
interpret_test.go:1234: test #42: funcs: Running func `subgraphOutput`
interpret_test.go:1234: test #42: funcs: not yet loaded: call
interpret_test.go:1234: test #42: funcs: Running func `printf@0xc000279980`
interpret_test.go:1604: test #42: FAIL
interpret_test.go:1605: test #42: stream errored: func `subgraphOutput` stopped before it was loaded
interpret_test.go:1234: test #42: funcs: Exiting func `printf@0xc000279980`
interpret_test.go:1234: test #42: funcs: Exiting func `subgraphOutput`
Meaning I think there might be a channel bug in the func code.
I also notice that we hang at:
obj.wg.Wait() // don't cleanup these before Run() finished
In the engine. This leads me to believe there is a channel bug in your function code. I can help debug that with you.
BUT to be 100% sure it's not my fault (it might be!) I'm working on a real test for the engine stuff so I can be sure it's not the engine's fault. (I don't think so, but it's the most productive thing I can think to do on my own.)
Is there a way to see the function graph after the sub-graph is created?
I can add some debugging as part of the Txn Commit() to print the graph and run it through graphviz if you like.
I wrote the debugging patch for you. It is here: ea808dbcaf2a9b9c702d0052fe20909f5f804d2f You can cherry-pick it. It produces this output (with one arg):
With two args:
If you instead run:
test fmt.printf("onlyonearg")
Does it still display that non-determinism?
Yes.
I also rebased a copy of this branch (which is rebased on top of that merge-stuff branch) to save you doing the work if you'd like. That branch is feat/sam-squashed-rebased.
Ok, I have now force-pushed feat/sam-squashed-rebased
over mcl-functions
.
I wrote the debugging patch for you. It is here: https://github.com/purpleidea/mgmt/commit/ea808dbcaf2a9b9c702d0052fe20909f5f804d2f You can cherry-pick it. It produces this output [...]
Looks good, except for the arg%d XXX
. Excellent, at least everything runs fine until the sub-graph is created!
This leads me to believe there is a channel bug in your function code.
I confirm that ChannelBasedSinkFunc::Stream()
does close its output channel without emitting any value downstream, but that's intentional:
func (obj *ChannelBasedSinkFunc) Stream() error {
defer close(obj.Chan) // the sender closes
close(obj.init.Output) // we will never send any value downstream
[...]
}
The subgraphOutput is a node with out-degree zero which only exists for its side-effects, namely forwarding the input values to the call node so that the call node can emit those values downstream. If you want, ChannelBasedSinkFunc could emit a dummy value before closing the channel, or it could even emit all its input values downstream as it receives them; it doesn't matter, as nobody is ever going to read those values.
The subgraphOutput is a node with out-degree zero which only exists for its side-effects, namely forwarding the input values to the call node so that the call node can emit those values downstream. If you want, ChannelBasedSinkFunc could emit a dummy value before closing the channel, or it could even emit all its input values downstream as it receives them; it doesn't matter, as nobody is ever going to read those values.
If it doesn't have any outdegree nodes that are part of the function graph (not the back channels), there's no where to send it's values, so it doesn't matter.
We should talk about this more to make sure we both have the same assumptions about how the function engine should be running though.
I don't know if it's intentional and we're missing an optimization, or if it's a bug, but for the simple case like: test fmt.printf("hey") {}
then I would expect to not need the fancy callfunc misdirection stuff... But that we can discuss IRL.
I can keep working on engine test stuff unless there's something you can recommend I do instead to unblock you somehow.
I don't think I'll have time to write any code until we meet IRL; if you want, you could tweak the ChannelBasedSink to send a dummy value downstream, or change the arg%d XXX names to match the argument names in the nearby Type value.
On Tue, May 16, 2023, 15:08 James @.***> wrote:
The subgraphOutput is a node with out-degree zero which only exists for its side-effects, namely forwarding the input values to the call node so that the call node can emit those values downstream. If you want, ChannelBasedSinkFunc could emit a dummy value before closing the channel, or it could even emit all its input values downstream as it receives them; it doesn't matter, as nobody is ever going to read those values.
If it doesn't have any outdegree nodes that are part of the function graph (not the back channels), there's no where to send it's values, so it doesn't matter.
We should talk about this more to make sure we both have the same assumptions about how the function engine should be running though.
I don't know if it's intentional and we're missing an optimization, or if it's a bug, but for the simple case like: test fmt.printf("hey") {} then I would expect to not need the fancy callfunc misdirection stuff... But that we can discuss IRL.
I can keep working on engine test stuff unless there's something you can recommend I do instead to unblock you somehow.
— Reply to this email directly, view it on GitHub https://github.com/purpleidea/mgmt/pull/704#issuecomment-1550212268, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL62D2JMRFZICLNIQM62DXGPGELANCNFSM6AAAAAAVUSX4KA . You are receiving this because you modified the open/close state.Message ID: @.***>
Turns out I did have time to tweak the ChannelBasedSink after all!
– Samuel
On Tue, May 16, 2023 at 6:44 PM Samuel Gélineau @.***> wrote:
I don't think I'll have time to write any code until we meet IRL; if you want, you could tweak the ChannelBasedSink to send a dummy value downstream, or change the arg%d XXX names to match the argument names in the nearby Type value.
On Tue, May 16, 2023, 15:08 James @.***> wrote:
The subgraphOutput is a node with out-degree zero which only exists for its side-effects, namely forwarding the input values to the call node so that the call node can emit those values downstream. If you want, ChannelBasedSinkFunc could emit a dummy value before closing the channel, or it could even emit all its input values downstream as it receives them; it doesn't matter, as nobody is ever going to read those values.
If it doesn't have any outdegree nodes that are part of the function graph (not the back channels), there's no where to send it's values, so it doesn't matter.
We should talk about this more to make sure we both have the same assumptions about how the function engine should be running though.
I don't know if it's intentional and we're missing an optimization, or if it's a bug, but for the simple case like: test fmt.printf("hey") {} then I would expect to not need the fancy callfunc misdirection stuff... But that we can discuss IRL.
I can keep working on engine test stuff unless there's something you can recommend I do instead to unblock you somehow.
— Reply to this email directly, view it on GitHub https://github.com/purpleidea/mgmt/pull/704#issuecomment-1550212268, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAL62D2JMRFZICLNIQM62DXGPGELANCNFSM6AAAAAAVUSX4KA . You are receiving this because you modified the open/close state.Message ID: @.***>
FYI: With your latest patch I now get:
interpret_test.go:1234: test #42: stream...
interpret_test.go:1234: test #42: funcs: Running func `call`
interpret_test.go:1234: test #42: funcs: Running func `const`
interpret_test.go:1234: test #42: funcs: Exiting func `const`
interpret_test.go:1234: test #42: funcs: Running func `const`
interpret_test.go:1234: test #42: funcs: Exiting func `const`
interpret_test.go:1234: test #42: funcs: func `const` started
interpret_test.go:1234: test #42: funcs: func `const` started
interpret_test.go:1234: test #42: funcs: Running func `const`
interpret_test.go:1234: test #42: funcs: Exiting func `const`
interpret_test.go:1234: test #42: funcs: func `const` started
interpret_test.go:1234: test #42: funcs: not yet loaded: call
interpret_test.go:1234: test #42: funcs: not yet loaded: call
interpret_test.go:1234: test #42: funcs: pausing...
interpret_test.go:1234: test #42: funcs: paused!
interpret_test.go:1234: test #42: funcs: resuming...
interpret_test.go:1234: test #42: funcs: resumed!
interpret_test.go:1234: test #42: funcs: Running func `subgraphOutput`
interpret_test.go:1234: test #42: funcs: Running func `printf@0xc0007507e0`
interpret_test.go:1629: test #42: FAIL
interpret_test.go:1630: test #42: interpret failed with: missing value in table
interpret_test.go:1234: test #42: funcs: Exiting func `printf@0xc0007507e0`
I haven't debugged it yet.
FYI: I've written some more tests for the txn stuff, and I am now fairly confident that there are no bugs in the txn code.
It's also easy to add new sequences of addvertex/addedge/commit/reverse operations and test it does what you expected.
Those patches are now rebased into the feature branch:
feat/sam-squashed-rebased
and there is the same one with the graphviz patch on top at:
feat/sam-squashed-rebased-graphviz
If you're curious, you can see one of these tests here: https://github.com/purpleidea/mgmt/blob/58e08b6c4bd90d93eb23466affcfdedc529bfee7/lang/funcs/dage/txn_test.go#L340
You basically just give it a list of txn operations and it runs them in order alongside any checks that you want as well.
Now onto more complex engine tests for me!
All right, so yesterday we've managed to successfully call a builtin function, and a lambda with no arguments. Here's my attempt at explaining the current difficulty with calling lambdas which do have arguments.
$id = func($x) {$x}
$foo = $id("foo")
$bar = $id(42)
When type-checking, the $id = ...
definition is ignored. Instead, a copy is made at the $id(...)
call site, and it is only that version which is type-checked. This makes it possible for $id
to have type func(x string) string
at one call site and type func(x int) int
at the other call site, without having to give the definition site a polymorphic type.
Then, when constructing the function graph, the $id = ...
definition is compiled to a const node which emits a FuncValue
, and that node is added to the runtime environment. Because this definition has not been type-checked, the FuncValue
's type is nil
.
Next, at each call site,the ExprCall.Graph(env)
attaches the const node from the runtime environment to a new CallFunc
node. This is a problem, because AddEdge
calls AddVertex
, which errors out when it sees the nil
type` (while trying to validate that this type is a function type).
Easy solution which doesn't work number 1: also type-check the definition site.
This doesn't work because MCL doesn't support polymorphic types like forall a. a -> a
, so the type of func($x) {$x}
is ambiguous. Thus, if we try to fix the problem by type-checking the definition site, the program will fail to typecheck instead of crashing. Better, but not what we want.
Easy solution which doesn't work number 2: call the type-checked copy, the ExprCall.expr
field.
This would work in this particular example, and in fact that's what we do on master. So let's review why this branch uses an environment.
On master, we know exactly how many times each function is called and with which arguments. We know that information statically, before type-checking. So we can copy ExprCall that many times and have the ExprCall.expr
field be correct for each copy, and then type-check each.
On this branch, we want to support functions like map which call other functions a different number of times depending on some runtime values. So the information about which function call receives which argument is no longer known statically, it is now known dynamically, after type-checking. Scope and ExprCall.expr
are both filled before type-checking, so they can only hold the statically-known information. That's why we need a new place in which we can store the dynamically-known information: the environment.
So even though in this example, it would be possible to use ExprCall.expr
, if we want to progress towards supporting map (as opposed to progressing from the current state of the branch back towards master), we must find a way to call the function from the environment, not the function from ExprCall.expr
.
we must find a way to call the function from the environment, not the function from ExprCall.expr.
In case this is a helpful insight: is it helpful to use ExprCall.expr merely to determine the type (during type checking) and then to copy that determined type into the N different copies of the actual function that we'll need? I suspect the N copies will have the same type as the singular ExprCall.expr that we have copied around. And in this case then we won't need a second stage of Unify().
HTH
Yes, that seems sensible: ExprCall.expr
does have the correct type for each call, and the function from the environment does have the correct behavior for each call. Copying the type thus seems like it should give us the best of both worlds!
But how do we accomplish this in practice? It would be easy if we had one {V: wrongValue, T: correctType}
struct and one {V: correctValue, T: nil}
struct. In reality, we have a correctType
and a Func
which emits a FuncValue
. Probably a ConstFunc
. Okay, so we can cast the Func
to a ConstFunc
, then cast the Value
it holds to a FuncValue
, then set its T
.
That seems quite fragile. What if the Func
comes from if ($now % 2 == 0) then $f else $g
? Do we need to add Func.SetType(Type)
to the Func
interface, so that IfFunc
can forward the SetFunc
request to its two branches?
And what if the Func
has already been added to the graph earlier? For example, the $id("foo")
call would set $id
's type to func(string) string
and add it to the graph, and then $id(42)
would overwrite $id
's type to be func(int) int
?
I think it might be wrong for runtime functions to even have a type. Would it perhaps be sufficient to have the Sig.Ord
, so we can generate edges with the right names?
Or maybe FuncValue.call()
should take an extra Type
argument?
That seems quite fragile. What if the Func comes from if ($now % 2 == 0) then $f else $g? Do we need to add Func.SetType(Type) to the Func interface, so that IfFunc can forward the SetFunc request to its two branches?
There are two types of functions (type wise) in mgmt:
1) Any normal function which has a static type. For example: strings.ToLower is a func(str) str
.
2) A function which could take on between 1 and infinite numbers of types. Those are represented by one of the polymorphisms variants. All of those have a .Build(typ) method.
So all we need to do is run: Build(*types.Type) error // then, you can get argNames from Info()
(See lang/interfaces/func.go @ PolyFunc interface.
Tada!
If it's a Func instead of a PolyFunc you can do:
if polyFunc, ok := someFunc.(interfaces.PolyFunc); ok {
err := polyFunc.Build(typ) // XXX: remember to check for error! Some types might not fit.
if err != nil { ... }
} else {
// the type is already known statically!
}
Does this help?
IfFunc can forward the SetFunc request to its two branches?
I didn't completely understand this question. We should after unification know the type for everything... No need to recurse it through... That happens in the AST already automatically, doesn't it? Or did you mean something different?
And what if the Func has already been added to the graph earlier? For example, the $id("foo") call would set $id's type to func(string) string and add it to the graph, and then $id(42) would overwrite $id's type to be func(int) int?
They each work on their own copy both for type unification, and then runtime... So it's safe. There should be no conflict, right? Remember we never Unify OR use the original ExprFunc definition site.
the type is already known statically! [...] We should after unification know the type for everything...
Remember that the problem we are trying to solve is that we have a Func
in the environment for which type-checking did not run, and intentionally so, because func($x) {$x}
has an ambiguous type in MCL. So it is not the case that we already statically know the type for all Func
s.
Remember we never Unify OR use the original ExprFunc definition site.
In this branch, we do use the original ExprFunc
definition site, it's the Func
we put in the environment!
Easy solution which doesn't work number 3: only keep Sig.Ord
at runtime.
The idea there is that maybe the Func
from the environment is fine as-is, it's AddVertex
which panics unnecessarily. After all, Haskell's types are erased at runtime, so why can't MCL's?
In order for this approach to work, we need to make sure types are not needed at runtime. I know that the Sig.Ord
part of the type is used in order to label the incoming edges (instead of the labels arg%d XXX
), because we fixed that recently. Anything else?
Alas! A polyfunc like +
does need to know its type in order resolve to the correct implementation.
Oh! I think I know what the problem is: MCL functions are not functions!! They are C++-style templates for functions. They need to be instantiated at a certain number of concrete types, those which are used in the program, and then those instantiations can be used as functions.
func($x) {$x + $x}
doesn't have a type. It does not have a runtime representation. I can't store it in the environment. func(str $x) {$x + $x}
does have a type, it does have a runtime representation, and I can store it in the environment.
So, in $id("foo")
, the template named $id
is instantiated (copied), which is how it can have a distinct type from $id(42)
. The next step is to figure out exactly where templates are instantiated. We previously agreed that in map(..., [$id, ...])
, $id
must have a monomorphic type like func(string) string
, so it must be instantiated at that point. Is that also the case for [$id, ...]
? What about map($id, ...)
? What about if ... then $id else...
?
The next step is to figure out exactly where templates are instantiated.
Currently this happens in SetScope. I hope that doesn't need to be changed, but if it's the only way, then we change. It does this so that the next step (Unify) has all the information to build all the missing types.
Unify() only works on objects it has been told about. So for example if you give it obj1, obj2, obj3 and some constraints, eg type(obj1) == type(obj2) and obj3 == str, then it will work it all out for obj1,2,3. If you never tell it about obj4, then it will never need to figure out that type.
So let's say obj1 is ExprCall.expr, and so we solve for it's type. There might be N different functions inside map with the same type. Call them objM1,2,3...n. We don't need to tell Unify about those functions. All that matters is that during function generation we use the information gleamed in Unify to set those types before the function engine runs.
HTH
I don't mean where in the golang code, I mean where in the MCL code. Currently we only copy the callee to ExprCall.expr
at call sites, but that's clearly not enough since map(..., [$id, ...])
does instantiate $id
to a concrete type, and yet we do not copy $id
. If we don't copy $id
there, then the following will be a type error:
$id = func($x) {$x}
$strs = map(func($f) {$f("foo")}, [$id])
$ints = map(func($f) {$f(42)}, [$id])
I've rebased into some new changes. These are available as feature branch: feat/sam-squashed-rebased This should match this current PR but with my new changes incorporated. The changes are as follows:
I have some test cases that show it works. I have not merged those yet because I am still building out the test cases so that I have a chance of finding more bugs (if any).
At least we're hopefully unblocked on the engine front now. Thanks for working on the algorithmic function figuring!
My new tests have been added. This is rebased again and still available at: feat/sam-squashed-rebased We can add more tests if we find a scenario which breaks the engine. I'm feeling pretty optimistic about it though.
I've updated feat/sam-squashed-rebased again. This time it includes a long-delayed port from the old function interface that included:
Stream() error
Close() error
And instead replaces those two with:
Stream(context.Context) error
Which allows you to call the close directly through the context. I also added a patch to update the channel source and sink functions.
Here is a hopefully-simpler way to understand the problem.
Part1: why we copy functions
MCL supports functions like +
and func($x) {$x}
which can have more than one type. Those functions need to be specialized to a single type before their Graph()
method is called, otherwise the function graph engine will complain that a Func
node does not have a type.
On master, this is done by copying the function in ExprCall
. That copy is only used at one type, which the type-checker finds; done.
In this branch, we want to support map
. In map($f, [...])
, the calls to $f
are made at runtime, so it is too late to copy $f
at every call site like ExprCall
does. However, those calls are all made with the same type, so it suffices to copy $f
in the ExprCall
node for map($f, [...])
.
This means that ExprCall
must also copy its arguments, not just the function!
Part 2: why the environment is broken
Looking up values in the environment by name implies that there is only one value with that name. But if printf("%s %d", $f("foo"), $f(42))
makes two copies of $f
, there are now at least two values named $f
at runtime. 3, if we count $f
's definitions site. Our current bug is due to the fact that when we lookup $f
in the environment, we find the definition site's version of $f
, when what we want is for the first call to find the first copy, and for the second call to find the second copy.
This means that when we copy, we must rename all the copied variables in order to make them unique!
This means that when we copy, we must rename all the copied variables in order to make them unique!
\o/ sounds like we have a solution! This feels like it's firmly in your patch territory, but I can pair with you on it if you want, or alternatively, if you have any menial tasks for me, let me know what to do =D
I think I have found an easier way to implement this than to rename the variables when copying nodes.
In this branch, there are currently two types of variables in the environment: the variables which come from the parameters of a function, and the variables which refer to top-level definitions. I now think it was a mistake to put both in the same environment.
On master, no environment is needed because there is a one-to-one correspondence between Expr
s and Func
s. On this branch, an environment is needed because the Expr
s inside lambda bodies correspond to multiple Func
s. But the Expr
s at the top-level do correspond to a single Func
!
Another difference between top-level variables and function parameters is that it is only the top-level variables which can have more than one type and thus need to be copied. After all, the arguments are copied so that a single type can be given to each argument before the call is executed. Thus, while the call is running, each parameter is already bound to a Func
with a single type.
This suggests a simple solution:
ExprCall
should copy the functions and arguments which come from the top-level, but it should not copy the functions and arguments which come from lambda parametersExprVar.Graph()
should call the Graph()
method of the top-level copy to which it refers, if that's what it refers to, and it should lookup the variable name in the environment if the variable refers to a lambda parameter.In order to make this plan more concrete, can you tell me
ExprCall.expr
copy made?ExprVar
knows what the variable refers to?In this branch, there are currently two types of variables in the environment: the variables which come from the parameters of a function, and the variables which refer to top-level definitions. I now think it was a mistake to put both in the same environment.
You might remember I have my scope
type here: https://github.com/purpleidea/mgmt/blob/446dbde836c10159db0370c9eacf97aa3c4858a2/lang/interfaces/ast.go#L214
Notably it has separate buckets for variables, functions, and classes, as well as "indexes" which had to do with function arguments and nested functions and so on. It never worked 100% but it got close. You might remember we used it a bit as well. The "chain" btw was for detecting recursion.
If this is helpful, please let me know... It's feeling more and more like your "environment". Reminder, at the moment it is the input argument for SetScope: SetScope(scope *interfaces.Scope) error
.
Where is the ExprCall.expr copy made?
In SetScope. Here for example: https://github.com/purpleidea/mgmt/blob/446dbde836c10159db0370c9eacf97aa3c4858a2/lang/ast/structs.go#L7603
Note it is not always created via a copy if it's meant to have the same reference, eg: https://github.com/purpleidea/mgmt/blob/446dbde836c10159db0370c9eacf97aa3c4858a2/lang/ast/structs.go#L7588
How does VarExpr knows what the variable refers to?
ExprVar is pulled out of the "scope" when it's first needed. That currently happens in Unify and Graph for example. See here: https://github.com/purpleidea/mgmt/blob/446dbde836c10159db0370c9eacf97aa3c4858a2/lang/ast/structs.go#L8469 and https://github.com/purpleidea/mgmt/blob/446dbde836c10159db0370c9eacf97aa3c4858a2/lang/ast/structs.go#L8524
HTH
And what does the scope point to when the ExprVar
points to a lambda parameter?
This PR is not yet complete. It exists to make it easy to look at and comment on the code for the
mcl-functions
feature branch, which completely reworks the way in which functions work in MCL.The problem
Today in
master
, builtins and lambdas use very different compilation strategies. A builtin is compiled to a single Func which receives values from upstream nodes and emits values downstream. A lambda, by contrast, is compiled away, as the lambda's body is inlined at every call site. Once this inlining is performed, only builtins remain, so the resulting function graph is the graph of their Funcs.This does not work at all for MapFunc, a builtin which expects to receive a FuncValue as one of its inputs, because the above compilation strategy does not produce FuncValues, neither for builtins nor for lambdas. It would in fact be incorrect to construct a FuncValue from a builtin, because builtins can have an internal state and emit multiple values over time (e.g.
System("seq 3")
emits"1"
, then"2"
, then"3"
), but a FuncValue can only transform input values into a single output value, so it cannot represent that complexity. And of course, lambdas can contain builtins (e.g.func(x) { System(x) }
), so they cannot be represented by a FuncValue either.The solution
The function graph must be able to change over time, because if
func(x) { Map(System, xs) }
first receives a list of length 1 and then a list of length 2, then at first there must be one SystemFunc in the function graph, then later there must be 2 SystemFuncs.In order to do this, a FuncValue must be able to do more than transform input values into a single output value. Instead a FuncValue must describe in which way the function graph needs to change in order to accommodate a new instance of the function which the FuncValue represents. This branch's FuncValue does this by receiving input Funcs, creating and adding new nodes to the function graph, and then returning the one node among those new creations which shall be treated as the output Func, that is, the one whose output values are the result of applying the function represented by the FuncValue to the inputs received from the input nodes.
Another big change is that the function engine now works on a graph of Funcs instead of a graph of Exprs. This allows FuncValues to add Funcs to the function graph, without having to wrap them in Expr wrappers which don't correspond to any MCL code the user wrote.
Overview of the changes so far
xs
becomes a list of length 1 again, we must remove the now-disconnected SystemFunc node we created for the length 2 case.ExprVar
or anExprCall
.interpret_test.go:1202: test #62: funcs: process end err: context canceled...
messages. Those are red herrings since we're interrupting process and it's normal.interpret_test.go:1546
Remaining work
map
function changing its inputs over time, or other things as well.Given the number of changes, I would be surprised if all of that worked the first time, so in addition to the above, I expect a long debugging session to make all the tests pass.