Closed johnynek closed 8 years ago
here is the storm code for getting options:
added #636
If it is too dangerous to change how options work for fear that we will break jobs, we should write very clear rules and a lot of tests for those rules, and link that documentation to def name
in Producer.scala
I'm getting a ton of diffs from scalariform, for some reason. I would have assumed we already hit all these files, but maybe not.
Scalding takes the first name which specifies the option right? not the first name? What occurs in a branch scenario is a bit ill defined, but other than that i think the naming at the moment is reasonably well defined isn't it? -- maybe not the best naming strategy but it is at least defined pretty clearly.
Changing the options so only one name could effect a given node would be a large change, also its not clear to me how that would always work. Since users can name a given node more than once, but moreso, if i go .map.name.map.name how does that get resolved? since we will collapse both map's into one physical block of code. But the user doesn't control how we collapse them
It seems the logic is the same in the code for both storm and scalding, isn't it?
Yes, I agree that changing the options is probably a non-starter. And I guess the naming is understandable, though fan-out is ill-defined. You can always side-step that by naming a node before a fanout.
The strange thing is that adding a option below will override the defaults for all nodes above it. That could be a little weird, but I guess people aren't too upset with it.
So, I would backstep: let's refactor to share more code (as this does) and add more tests (which this does) to be clearer about how functionality is working.
Yes i think the logic is the same, or in at least its goal is the same on both platforms. Agree totally about consolidating the options handling and adding the extra tests as you've done here.
Changing the options handling seems like a bit of a non-starter I agree, though fundamentally I wonder if this winds up being very limiting in how one could do graph re-writing. For future generations of this sort of tech probably need to consider how options should be done instead.
Regarding the scala formatting, are you getting it up from sbt or somewhere else? If i manually modify the scalding platform file and re-run it it puts it back to how it is on develop now. (looking at the whitespace changes on the getCommutativity method)
Yeah, I'll check. Maybe I have some global plugin or something.
@ianoc somehow it seems having one of:
addSbtPlugin("org.ensime" % "ensime-sbt" % "0.1.7")
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.7.5")
in my ~/.sbt/0.13/plugins/plugins.sbt
file was resetting the scalariform options.
Interesting, i have the dependency graph one in mine too, so i'd guess its the ensime one.
Looks like it depends on scalaireform @ https://github.com/ensime/ensime-sbt/blob/master/build.sbt#L25 (looks separate to that project just using it, they have it in their project/plugins too for that purpose). odd
had a minor comment about documenting a return. Once thats fixed looks good to me, merge when green
man our storm tests are flakey.
I wonder if we are scooping up OOM errors into trys or futures somehow, and this is making some tests look flakey that are actually OOM errors on travis (at least some of the failures here were OOM, but the tests reported failed not due to exceptions).
This is to address #633. It is not closing that issue, but it is a start.
Also, this exposes something that I think is a problem: the naming strategy seems pretty broken to me. I think names/options are pretty ill-specified. For instance, looking up options is done in the platforms. Scalding takes the first name that matches:
https://github.com/twitter/summingbird/blob/develop/summingbird-scalding/src/main/scala/com/twitter/summingbird/scalding/ScaldingPlatform.scala#L274
which, does not really make sense if two names match due to a fan-out situation.
The solution I would propose is this:
0) move the option resolving code to a common location and don't repeat that code on the platforms. 1) we only return the first name node we can reach on each branch. So, you can only have more than one name if there is a fan-out and you are named on each branch. 2) Each option has a rule for how to combine if a node has more than one name (fanout) and each of those names has an option set. 3) default only comes into play if there is no name set.