Open kgalens opened 1 year ago
You are right, the problem is with comparing group keys to other objects. It works when the groupkey is on the left-hand side but not when it's on the right:
v = 'BATCH1'
gk = groupKey(v, 3)
println v == gk // false
println gk == v // true
I don't know if there is a way to override the equals from the right-hand side...
@pditommaso one solution could be to "unwrap" the group key when it is emitted by groupTuple
+1 to Ben's unwrap approach. This matches the semantics of the tidyverse groupBy
and summarize
functions. I find that the groupKey just gets in the way after the groupTuple operation has been applied.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Bug report
A channel may emit all, some, or none of what is expected, depending on the order of completion of upstream tasks. This can be caused when using the
groupTuple
operator in combination withgroupKey
.It seems that the underlying issue may be related to the output from groupKey not being a string in some contexts (seemingly inconsistent, it works in some places but not in others). This can be remedied by calling
.toString()
on the value output bygroupKey
but it would be great if this emitted a string so channels would behave predictably and consistently across contexts.Expected behavior and actual behavior
Expected behavior for a channel would be that the values emitted would be the same (perhaps in a different order) so long as the input channels emit the same values (again, order should not matter).
There exists an example where this is not the case. Depending on the order of upstream task completions, the channel may emit fewer tuples (without error) than expected, or even none as demonstrated in the example below.
Steps to reproduce the problem
In the below example, this is setup to ensure that the
by_sample
tasks will finish first (sleep 1
) andby_batch
tasks will finish after all samples have completed (sleep 30
). This will create a situation where nothing is emitted from thebatch_done
channel (even though we'd expect 2 tuples to emitted, one for each batch).Program output
If samples finish before batches (as enforced in the example above), we do not see any emitted from the
batch_done
channel:If we switch it up and enforce the batches to finish before samples (i.e.
by_batch
:sleep 1
andby_sample
:sleep 30
), the output looks similar to (notice that nowbatch_done
emits a tuple for each batch):Environment
OpenJDK Runtime Environment AdoptOpenJDK-11.0.11+9 (build 11.0.11+9)
Additional context
This can also be fixed in a couple other ways which may demonstrate the underlying problem. If we change the channel creation of
batch_done
to the following, we always see the expected output, no matter the order upstream tasks complete in (notice the call tobatch.toString()
in line 7):