Closed robsyme closed 2 months ago
The error occurs because buffer
is a HashMap, so we're storing the keys by their hash, and the hash differs between these identical looking objects.
I think the problem here is that the key in this map entry: list:[item3]
i.e. [item3]
is an object of the class nextflow.util.ArrayBag
which has unexpected hashing properties.
I tried overriding the hashCode method in the ArrayBag class
@Override
int hashCode() {
target.hashCode()
}
But this did not affect the inability of the JoinOp class to get(item0.keys)
correctly 100% of the time.
An even more minimal example might be:
bag1 = new nextflow.util.ArrayBag('lunch')
bag2 = new nextflow.util.ArrayBag('lunch')
myMap = [:]
myMap[bag1] = 'sandwich'
found = myMap[bag2]
if(!found) {
log.info "This hash: ${myMap} does not have key ${bag2}!"
} else {
log.info "found: $found"
}
... which of course prints
This hash: [[lunch]:sandwich] does not have key [lunch]!
Adding the hashCode
and equals
methods to ArrayBag will allow the objects to be used interchangeably as keys in LinkedHashMaps which would resolve the problem. The code is even already ready to go:
Any reason why we wouldn't simply uncomment those lines?
I've tested and re-adding these methods to the ArrayBag class resolves the issue.
Actually, just explicitly delegating to the target
works as well (implemented in #5189).
It's unclear to me why the @Delegate
annotation is not delegating the equals and hashCode methods. I suspect these are methods provided by a higher interface and the interfaces = false
argument to the annotation is preventing them being called.
TLDR; groovy by-passes the equals
and hashCode
method when checking the object identity because it implements its own strategy for collection objects
Ah! Thanks! Where would I see how equals and hashCode are (implicitly?) implemented for ArrayBag?
If I'm not wrong identity for collection objects is implemented via this method
Bug report
When the
join
operator is joining channels where the join key is a Map, the comparison can fail in a nondeterministic manner.Inside the join operator, we
get
the key from thebuffer
object here:https://github.com/nextflow-io/nextflow/blob/cc6ec3142de0131b7dbeb8fed38b0e6506e86bdc/modules/nextflow/src/main/groovy/nextflow/extension/JoinOp.groovy#L180
This
get
operation uses.equals()
to compare the argument to the keys in thebuffer
Map. The Nextflow script below produces keys where:self.equals(other) = false
andself == other = true
Because the Map
get
method uses.equals()
to compare objects, the join operator fails to join the elements correctly in some cases.Expected behavior and actual behavior
The example workflow below correctly outputs two elements in the
JOINED:
channel only about half the time. You may need to run the workflow multiple times to see the nondeterminism.I would expect that every time I run the workflow, the final "joined" channel would always return two items:
However, sometimes the channel only returns one item (usually the first):
Note that every time that the final channel only returns one item, the value returned by the
hashCode()
method on the seemingly-identical keys is different.Program output
If I add in the following snippet into L178 here:
https://github.com/nextflow-io/nextflow/blob/cc6ec3142de0131b7dbeb8fed38b0e6506e86bdc/modules/nextflow/src/main/groovy/nextflow/extension/JoinOp.groovy#L177-L184
The incorrect output looks like: