Closed wolffcm closed 2 years ago
It seems like not all code paths generate a yield, and that's why we needed this PR: https://github.com/influxdata/flux/pull/4488 We need a better understanding of this case before we can make this change.
In an effort to better understand how we arrive at each of the two branches, I cooked up a small repro:
import "array"
import "experimental"
import "sql"
emptyFn = () => array.from(rows: [{x: 1}])
write = () => emptyFn() |> sql.to(dataSourceName: ":memory:", driverName: "sqlite3", table: "none")
experimental.chain(first: write(), second: emptyFn())
The executor branch is hit while write()
executes, and the planner branch is hit by the chain()
.
I'm not yet sure how the refactor should actually go, but by tracing through with the debugger I'm hoping to form some opinions.
With breakpoints set in each of the two places we generate implicit yields today, I ran the reproducer (above) through the debugger.
First, we land in the executor branch.
Frames of (my) interest for this might be:
chain.go:57
where the body for experimental.chain()
is defined and the "sub program" for the first
parameter is run.executor.go:109
where the executionState
is established and used to produce a visitor.In this case, the final frame where we hit the breakpoint is inside createExecutionNodeVisitor.Visit
.
Second, we land in the planner branch.
The logicalNode
here is named array.from0
, produced by emptyFn
. My belief is this is the one returned by experimental.chain()
(which will spit out the second
parameter directly after trying to block on the stream argument passed as first
).
In future runs, I may experiment with wrapping the
array.from
calls in differently named functions to help keep better track of which expressions are being visited, and when. I'm sure the detail I need is already available, but I'm still working to better understand the execution path so more landmarks might be helpful.
Frames of (my) interest here are:
repl.go:196
where the program unwinds its side-effects.logical.go:186
where we instantiate a fluxSpecVisitor
.The visitor is used to walk nodes in the plan, and the branch where we insert the yield happens inside the call to fluxSpecVisitor.visitOperation
.
My thinking is I'll set some breakpoints around logical.go:186
to see if the visitor can see the other node that would get an implicit yield as it goes.
The common root for both branches is the call to REPL.executeLine
, defined at repl.go:175
.
Questions I have currently:
experimental.chain()
, what circumstances land us in the executor branch?Separately, I think we had some discussion at one point of improving the names we generate for implicit yields. An idea was to give them distinct (numbered) names automatically like we do for nodes.
Part of what I'm wondering is if the only way to land in the executor branch is via the "sub program" used for experimental.chain()
which has been shown to be problematic for flux test
in #4302.
If experimental.chain()
is the only way to trigger this branch, I'd perhaps look to refactor it such that it doesn't.
I've been working towards proving or debunking the notion of the branch we land in being determined by the use of a "sub program" or not.
So far, signs point to this not being the reason. During the standup today, tableFind
was offered as a 2nd place in the stdlib where we compile a "sub program" and execute it. With tableFind
, we don't land in either branch -- no implicit yield is generated. This could be just because tableFind
doesn't advertise side-effects, or I'm not sure what else.
Refining my repro program:
import "array"
import "experimental"
import "sql"
read = () => array.from(rows: [{x: 1}]) |> group(columns: ["x"])
write = () =>
array.from(rows: [{x: 1}])
|> group(columns: ["x"])
|> sql.to(dataSourceName: ":memory:", driverName: "sqlite3", table: "none")
// read and write each hit their respective branches ....
// r => planner
read()
// w => executor
write()
// hits the planner branch twice
//experimental.chain(first: read(), second: read())
// hits the executor branch twice
// experimental.chain(first: write(), second: write())
// r,w or w,r in chain lands us in each branch once
As such, it seems the branch selection is entirely based on whether or not the ops in question advertise side-effects.
Making a change like the following in the planner branch seems to allow it to cover both cases, rather than letting the side-effecting sql.to
from slipping through to the executor branch.
diff --git a/plan/logical.go b/plan/logical.go
index 91227343..b832901a 100644
--- a/plan/logical.go
+++ b/plan/logical.go
@@ -263,7 +263,7 @@ func (v *fluxSpecVisitor) visitOperation(o *flux.Operation) error {
// no children => no successors => root node
if len(v.spec.Children(o.ID)) == 0 {
- if isYield || HasSideEffect(procedureSpec) {
+ if isYield && HasSideEffect(procedureSpec) {
v.plan.Roots[logicalNode] = struct{}{}
} else {
// Generate a yield node
Seems like a change similar to this (not this -- this is incorrect), combined with removal of the other block might satisfy the DOD. I'm not sure how to best prove this is adequate outside of just showing the existing test suite passes (assuming it does).
I'm not sure how to best prove this is adequate outside of just showing the existing test suite passes (assuming it does).
The existing suite does not pass.
--- FAIL: TestPlan_LogicalPlanFromSpec (0.20s)
--- FAIL: TestPlan_LogicalPlanFromSpec/Non-yield_side_effect (0.02s)
logical_test.go:325: plans have 3 and 4 nodes respectively
--- FAIL: TestPlan_LogicalPlanFromSpec/Multiple_non-yield_side_effect (0.03s)
logical_test.go:316: found more than one call to yield() with the name "_result"
--- FAIL: TestPlan_LogicalPlanFromSpec/side_effect_and_a_generated_yield (0.02s)
logical_test.go:316: found more than one call to yield() with the name "_result"
The executor branch looks like it has some different code for generating names for the result based on the node leading to the yield being inserted.
I'll keep poking at this. The solution might be to fold the code from both branches together but we may need to revise some tests to get this to work. If assertions were made along the way with the assumption that some yields would not be generated, but they were being generated (somewhere else, out of view of the test) then the tests could be wrong...
Weakly feeling this out. Here's a patch that accounts for the dynamic naming of yields for the case where we see side-effecting nodes without successors, but continues to use the default name of _result
for the else
case:
diff --git a/plan/logical.go b/plan/logical.go
index 91227343..505860a4 100644
--- a/plan/logical.go
+++ b/plan/logical.go
@@ -211,8 +211,9 @@ func (v *fluxSpecVisitor) addYieldName(pn Node) error {
return nil
}
-func generateYieldNode(pred Node) Node {
- yieldSpec := &GeneratedYieldProcedureSpec{Name: DefaultYieldName}
+func generateYieldNode(pred Node, name string) Node {
+
+ yieldSpec := &GeneratedYieldProcedureSpec{Name: name}
yieldNode := CreateLogicalNode(NodeID("generated_yield"), yieldSpec)
pred.AddSuccessors(yieldNode)
yieldNode.AddPredecessors(pred)
@@ -263,17 +264,25 @@ func (v *fluxSpecVisitor) visitOperation(o *flux.Operation) error {
// no children => no successors => root node
if len(v.spec.Children(o.ID)) == 0 {
- if isYield || HasSideEffect(procedureSpec) {
+ if isYield {
v.plan.Roots[logicalNode] = struct{}{}
+ } else if HasSideEffect(procedureSpec) {
+ // Generate a yield node
+ name := string(logicalNode.ID())
+ generatedYield := generateYieldNode(logicalNode, name)
+ err = v.addYieldName(generatedYield)
+ if err != nil {
+ return err
+ }
+ v.plan.Roots[generatedYield] = struct{}{}
} else {
// Generate a yield node
- generateYieldNode := generateYieldNode(logicalNode)
- err = v.addYieldName(generateYieldNode)
+ generatedYield := generateYieldNode(logicalNode, DefaultYieldName)
+ err = v.addYieldName(generatedYield)
if err != nil {
return err
}
- v.plan.Roots[generateYieldNode] = struct{}{}
-
+ v.plan.Roots[generatedYield] = struct{}{}
}
}
With this patch applied, and running the following flux:
import "array"
import "sql"
read = () => array.from(rows: [{x: 1}]) |> group(columns: ["x"])
write = () =>
array.from(rows: [{x: 1}])
|> group(columns: ["x"])
|> sql.to(dataSourceName: ":memory:", driverName: "sqlite3", table: "none")
read()
write()
The output shows two results:
Result: _result
Table: keys: [x]
x:int
--------------------------
1
Result: toSQL2
Table: keys: [x]
x:int
--------------------------
1
The read()
call lands in the else
branch since it isn't a yield and doesn't have side-effects, earning it the name _result
. The write()
call gets a result named for the side-effecting node, toSQL2
. I'm not clear on why it's toSQL2
and not toSQL1
as I might have expected.
The original handling for the "has side-effects and no successors" in executor.go
adds a new result without adding a yield, so this naturally will cause some ripples in the test suite as node counts change and so on.
The block in executor.go
I'm trying to reproduce in the new branch here in logical.go
also does something about skipYields
, and I'm not yet sure how I'd do this over on the other side.
Specifically:
if plan.HasSideEffect(spec) && len(node.Successors()) == 0 {
name := string(node.ID())
r := newResult(name)
v.es.results[name] = r
v.nodes[skipYields(node)][i].AddTransformation(r) // XXX: how to achieve this?
}
In this case, nodes
is a field on the execution state visitor so I guess I'll need to figure out where the plan and the outcome of this visitor are combined.
Noting that Adrian makes mods to the executor branch in https://github.com/influxdata/flux/pull/4536. So far I've been looking at moving this logic up to the planner, but the other option (I plan to look at today) is to move it all down to the executor. When #4536 merges, it might make a stronger case for all this being in the executor.
Refactoring the other way (ie, to move all the logic down into the executor) seems doable, however there's some bookkeeping code somewhere that is no longer being satisfied.
A patch like
diff --git a/execute/executor.go b/execute/executor.go
index eba8b1f5..f8b6bb4a 100644
--- a/execute/executor.go
+++ b/execute/executor.go
@@ -287,11 +287,19 @@ func (v *createExecutionNodeVisitor) Visit(node plan.Node) error {
}
}
- if (plan.HasSideEffect(spec) || isParallelMerge) && len(node.Successors()) == 0 {
- name := string(node.ID())
- r := newResult(name)
- v.es.results[name] = r
- v.nodes[skipYields(node)][i].AddTransformation(r)
+ if len(node.Successors()) == 0 {
+ _, isYield := spec.(plan.YieldProcedureSpec)
+ if plan.HasSideEffect(spec) || isParallelMerge {
+ name := string(node.ID())
+ r := newResult(name)
+ v.es.results[name] = r
+ v.nodes[skipYields(node)][i].AddTransformation(r)
+ } else if !isYield {
+ name := plan.DefaultYieldName
+ r := newResult(name)
+ v.es.results[name] = r
+ v.nodes[skipYields(node)][i].AddTransformation(r)
+ }
}
}
}
diff --git a/plan/logical.go b/plan/logical.go
index 91227343..41237586 100644
--- a/plan/logical.go
+++ b/plan/logical.go
@@ -265,15 +265,6 @@ func (v *fluxSpecVisitor) visitOperation(o *flux.Operation) error {
if len(v.spec.Children(o.ID)) == 0 {
if isYield || HasSideEffect(procedureSpec) {
v.plan.Roots[logicalNode] = struct{}{}
- } else {
- // Generate a yield node
- generateYieldNode := generateYieldNode(logicalNode)
- err = v.addYieldName(generateYieldNode)
- if err != nil {
- return err
- }
- v.plan.Roots[generateYieldNode] = struct{}{}
-
}
}
diff --git a/plan/physical.go b/plan/physical.go
index 926d8f0a..9cfdb2b0 100644
--- a/plan/physical.go
+++ b/plan/physical.go
@@ -97,6 +97,10 @@ func (pp *physicalPlanner) Plan(ctx context.Context, spec *Spec) (*Spec, error)
// Update concurrency quota
if transformedSpec.Resources.ConcurrencyQuota == 0 {
transformedSpec.Resources.ConcurrencyQuota = len(transformedSpec.Roots)
+ // FIXME(onelson): hack to ensure we have at least `1` when Roots is empty.
+ if transformedSpec.Resources.ConcurrencyQuota == 0 {
+ transformedSpec.Resources.ConcurrencyQuota = 1
+ }
// If the query concurrency limit is greater than zero,
// we will use the new behavior that sets the concurrency
seems as though it should behave roughly equivalent to before, however when I run my repro I see: (fixed with the hack noted above).
failed to initialize execute state: invalid plan: plan must have a non-zero concurrency quota
Whatever leads us to this error message causes a slew of failures in the test suite so this is my next area of investigation.
It seems as though the concurrency quota is tied to the number of "roots" on the plan, which would normally be the responsibility of the planner branch I removed in this patch. I've sidestepped the issue for now by clamping the quota to a minimum of 1.
With this in place, the test suite is in sort of a sorry state :cry:
Whereas refactoring towards the planner gave us failures related to the overall shape of the plan (thanks to results being upgraded to yields), this refactor gives panics (nil pointers derefs, etc) in addition to assertion failures around the anticipated names of nodes.
I'll dig into the panics to see if there's something I overlooked. If the panics can be resolved, the two refactors will be closer to an even footing in terms of the required test refactors.
I'm not making much progress looking into the panics. Here's a sample:
--- FAIL: TestSlackPost (0.03s)
--- FAIL: TestSlackPost/.... (0.03s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x144e532]
goroutine 11 [running]:
testing.tRunner.func1.2({0x18ba580, 0x38c9190})
/usr/local/go/src/testing/testing.go:1209 +0x24e
testing.tRunner.func1()
/usr/local/go/src/testing/testing.go:1212 +0x218
panic({0x18ba580, 0x38c9190})
/usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/influxdata/flux/stdlib/slack_test.TestSlackPost.func1(0xc0018929c0)
/home/onelson/Projects/flux/stdlib/slack/slack_test.go:200 +0x4f2
testing.tRunner(0xc0018929c0, 0xc0003fa8c0)
/usr/local/go/src/testing/testing.go:1259 +0x102
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:1306 +0x35a
FAIL github.com/influxdata/flux/stdlib/slack 0.289s
slack_test.go:200
points to a call to res.Tables().Do()
, but it's unclear exactly where the panic is coming from.
Aside from the panics, there are other test failures related to tableFind
"not finding a table." Refactoring down to the executor definitely feels like a heavier lift, just to sort through this.
So, it looks like there was an early return for yields way up at the top of this method. I need to take another look top-down to make sure this executor-focused refactor really does approximate the work formerly done in the planner. That'll be my first thing tomorrow.
We have some technical debt with how we generate implicit yield nodes.
Just before we start planning, we add them in here, if a plan node has no successors and does not produce side effects: https://github.com/influxdata/flux/blob/ee80d230d11c5da9d50b8fa58951d4c61cfc3a8c/plan/logical.go#L265-L278
Later on, when we build the execution graph, we generate an implicit
yield
if there are no successors and the node does produce a side effect: https://github.com/influxdata/flux/blob/ee80d230d11c5da9d50b8fa58951d4c61cfc3a8c/execute/executor.go#L237-L242 (In this case we call it aresult
instead of ayield
)It seems clear we need to generate a
yield
(or whatever) for any node that does not have successors. We should do this in the same place every time. My recollection is that at one point we want to implicitly yield only if there were no side effects, but then we decided that we want to yield in the side effect case too. It's confusing to have this contradictory logic like this.DOD: