purpleidea / mgmt

Next generation distributed, event-driven, parallel config management!
https://purpleidea.com/tags/mgmtconfig/
GNU General Public License v3.0
3.6k stars 312 forks source link

Duplicate functions being created #722

Closed purpleidea closed 9 months ago

purpleidea commented 10 months ago

I noticed the following with our latest code. For example:

import "example"
$x = example.answer()   # gets called twice
test "test1" {
    int64 => $x,
}
test "test2" {
    int64 => $x,
}

Save in whatever.mcl and run with:

./mgmt run --tmp-prefix lang whatever.mcl

There are actually TWO functions started by the function engine. One for each $x. I know some of the time we need to copy the function and this will cause a second copy, but in this simple case, I think it's important that we don't copy it. Is there a reason we have to?

I think this is an important optimization, since I was seeing way too much memory usage for a situation when we had a small number of functions with their values everywhere, and well it explodes out of control.

Thanks!

/cc @gelisam

gelisam commented 10 months ago

The reason $x is copied is in order to support polymorphism. Here both call sites have type int, but it is possible to use $x at two different types, in which case we need two copies.

In theory, it should be possible to use a single copy in a case like this where the two use site have the same type. However, in practice, this is quite difficult to achieve because this would introduce a circular dependency between the type-checker and the part of the code which performs this copy.

Consider a similar example in which the two use sites have a different type. If we were to use a single instance for both use sites, the type checker would propagate the type information from one use site to the other, and then type unification would fail because it would try to unify two different types, the types of the two use sites. To avoid this, the current code makes a copy first, and then performs type-checking on each copy separately.

In order to avoid the duplication, the code which makes the copy would need to check if the two use sites need to instantiate $x at the same type; if so, use one copy for both use sites, otherwise make two copies. But now we have a circular dependency: we need to know the type before deciding whether to make a copy, and we need to make a copy before performing type-checking!

gelisam commented 10 months ago

Would it make sense to make two copies, perform type-checking on the two call sites separately, delete all the copies, and then create copies again using the new type information? That would make even more copies overall, but only at compile-time.

purpleidea commented 10 months ago

I wrote a test for this scenario (which obviously fails at the moment). It's currently here: https://github.com/purpleidea/mgmt/commit/bbb2b1569284a28780a3fb967f54f785d1956e1a and you can cherry-pick it if it helps you to validate things faster.

purpleidea commented 10 months ago

The reason $x is copied is in order to support polymorphism. Here both call sites have type int, but it is possible to use $x at two different types, in which case we need two copies.

If $x is the result of a function call like shown, I don't see how it can or should ever have two different variants called... So there should only be ever one example.answer() call.

Would it make sense to make two copies, perform type-checking on the two call sites separately, delete all the copies, and then create copies again using the new type information? That would make even more copies overall, but only at compile-time.

My spider sense tells me there's a simpler way. If you don't know of it, let's think on it for a day or so.

gelisam commented 10 months ago

A lot of the code is based on the assumption that top-level expressions are polymorphic, while function parameters are monomorphic. That's how I managed to reconcile the fact that MCL has polymorphic expressions but no polymorphic types. I think I have been pretty clear about that assumption:

it is only the top-level variables which can have more than one type and thus need to be copied. [...] ExprVar.Graph() should call the Graph() method of the top-level copy to which it refers, if that's what it refers to https://github.com/purpleidea/mgmt/pull/704#issuecomment-1567335374

In SetScope, once we figure out that a ExprVar points to a top-level definition, make a copy of that definition and make that ExprVar point to that copy instead. This allows each variable occurrence to instantiate that definition (e.g. func($x) {$x}) at a different type. https://github.com/purpleidea/mgmt/pull/704#issuecomment-1586174865

Remember that we now make copies of top-level definitions, so that each copy may have a different type. https://github.com/purpleidea/mgmt/pull/704#issuecomment-1605712610

now that we're copying the top-level definitions https://github.com/purpleidea/mgmt/pull/704#issuecomment-1605721705

top-level function definitions are not scope-checked nor type-checked, instead every top-level definition (function or not) gets copied at its use site, and it is only the copies which get scope-checked and type-checked. https://github.com/purpleidea/mgmt/pull/704#issuecomment-1689693906

Everything we import is a top-level definition and is thus polymorphic, and we should not call Graph() on a polymorphic expression, we should instead let ExprVar copy it and call Graph() on the copy. https://github.com/purpleidea/mgmt/pull/704#issuecomment-1696758739

I suspect that changing such a fundamental assumption is going to cause a lot of other changes and a lot of bugs. Nevertheless, we can try.

What is the new assumption? That function calls always return monomorphic values? Even if the function returns an empty list, which could be instantiated at different types? or if it returns another function, like func($x) {$x + $x}, which could be instantiated at different types?

purpleidea commented 10 months ago

In case it's helpful, this did the right thing for all values, before we started this functions work. I was able to make it work with my "light copy" approach, in case that rings a bell.

Maybe we only want to do these extra instantiations when we're returning a function?

gelisam commented 10 months ago

Maybe we only want to do these extra instantiations when we're returning a function?

That would be harder than "every function call", because we can see syntactically (right after parsing) where the calls are, but we won't know until after type checking which expressions return functions. So there would again be a circular dependency.

gelisam commented 10 months ago

In case it's helpful, this did the right thing for all values, before we started this functions work.

What did that pre-function code do with empty lists?

$empty = []
$name1 = printf("%v", [1,2,3] + $empty)
$name2 = printf("%v", ["hello", "world"] + $empty)
test $name1 {}
test $name2 {}
gelisam commented 10 months ago

I was able to make it work with my "light copy" approach, in case that rings a bell.

I remember that your Copy() operation weas recursively copying compound Exprs, but was leaving constants uncopied. I don't remember under which circumstance Copy() was called. Was it for polymorphism?

gelisam commented 10 months ago

As discussed on the phone:

  1. $x = ... should be monomorphic
  2. func f(...) {...} should be polymorphic
  3. class c(...) {...} should be polymorphic
purpleidea commented 10 months ago

I think that sounds right. If there are any implications of $x being monomorphic, please let me know. In particular around using it inside of iter.map / lambda related things?

Here's what I understand at the moment:

# must compile and work
func id($x) { $x }
$z1 = id(4)
$z2 = id("hey")
print fmt.printf("%v", $z1) {}
print fmt.printf("%v", $z2) {}

# nice to have but not essential (unless there is some corner case I'm not thinking of that we need?)
$id = func($x) { $x }
$z1 = $id(4)
$z2 = $id("hey")
print fmt.printf("%v", $z1) {}
print fmt.printf("%v", $z2) {}

# must show "true"
$r = math.random(8) # generates an eight char string
$b = $r == $r # must be true
print fmt.printf("%v", $b) {}

# must compile and work
class foo($x, $y) {
    ...
}

include foo(4, "hey")
include foo("wow", 3)
include foo("hello", "world")
purpleidea commented 10 months ago

To go further, if $x is polymorphic, we should only have one copy for all identical type signatures of it.

$x = builins.rand_whatever(8) # generates something random of length 8

print $x {} # string
print $x {} # string (same value)

This is important =D

gelisam commented 9 months ago

@purpleidea this can now be closed, right?