twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

Fix root cause of duplicate summer execution in Memory platform #667

Closed pankajroark closed 8 years ago

pankajroark commented 8 years ago

AlsoProducers. The root cause has actually to do with the summer. In toStream method we keep a map of Producer to stream to make sure we don't run parts of the graph again and again. The map uses value equality. The problem is that summer includes store which is mutable. If a summer is referenced by two also producers and if by the time summer is revisited the store has changed then the key would be different from the original summer and won't be found in the map, leading to the summer being planned again. d3b5808 seems to fix the issue and passes a unit test because that test has only one AlsoProducer, where planning left and right before forcing left means that store doesn't change in between. But it likely wouldn't fix the case where there are multiple AlsoProducers at different levels. Even if it did fix those cases this is an indirect solution and it's better to avoid taking any chances and fix the root of the issue.

This is a case where reference equality fits the bill. I don't know of an immutable implementation of a map that uses reference equality so I'm using java.util.IdentityHashMap, if there is one I would love to use that instead.

pankajroark commented 8 years ago

Some context for the review. This change is basically a revert of d3b5808 followed by use of IdentityHashMap instead of Map for JamfMap. btw, I don't know what Jamf stands for and curios find out.

johnynek commented 8 years ago

Jamf was an old inside joke between @singhala @sritchie and I based on Jules Winnfield's wallet, and changing the word "Bad" to "Jive Ass".

johnynek commented 8 years ago

https://www.youtube.com/watch?v=NUuwd8Z0l_4

pankajroark commented 8 years ago

He he, I love pulp fiction dialogs.

I've updated the review to use immutable map with reference equality.

johnynek commented 8 years ago

Okay, given that this was always an example and toy platform, if it is really important for twitter for it to have the (unrealistic) semantics that in an also, the left is fully evaluated before the right, I'm okay with that (since it didn't cause an issue for anyone that we know of yet).

If the tests can be fixed to not assume that, even better.

I'll leave it up to @pankajroark

👍

pankajroark commented 8 years ago

In this case we've decided to remove/fix the tests that are failing due to wrong assumptions. So we're going ahead and keeping the change to make execution of left producer lazy.

sritchie commented 8 years ago

Jamf is exactly what @johnynek says; it's also a reference to Dr. Laszlo Jamf, from Gravity's Rainbow: https://en.wikipedia.org/wiki/Gravity%27s_Rainbow

pankajroark commented 8 years ago

@sritchie Never heard of Gravity's rainbow, I've got to read it now :)