stripe-archive / brushfire

Distributed decision tree ensemble learning in Scala
Other
391 stars 50 forks source link

Factor out trees to support evaluation without training (issue #51) #91

Closed johnynek closed 8 years ago

johnynek commented 8 years ago

This factors out a brushfire-trees module. It had to include Error unlike #51 wanted, unless we want to refactor some, but it only depends on algebird and bonsai.

Also it breaks the style of having core types in Brushfire.scala and the a Foos.scala file. Instead, I renamed those to just Foo.scala and moved the type to the top.

johnynek commented 8 years ago

?r @avibryant

johnynek commented 8 years ago

PS: can you give me commit rights on the repo? Had to pull from my account.

avibryant commented 8 years ago
johnynek commented 8 years ago

Thanks for the review Avi. One question that came up while doing this was why (other than see the code a bit)?

Operationally, splitting packages brings a (small) cost to consuming, but the benefit is if some consumers can have thinner dependencies by avoiding the transitives of some parts. Here, I think that is really only the serialization stuff (which could be in a package like brushfire-serialization that sits between tree and training.

But note, we can't trim any deps by moving Error or Split so, I'm not 100% sure it is worth it to refactor. What do you think?

I can definitely do the refactor you suggest, but so far we are source and binary compatible, so it won't break. The refactor will break both, are we sure we want that? Don't want to do more work and then we get cold feet and don't merge.

What do you think of brushfire-serialization in between tree and training?

johnynek commented 8 years ago

I'll split serialization out in a follow up. This is big enough

johnynek commented 8 years ago

actually went ahead and split out serialization since it has a ton of dependencies and that gets you a nice win if you care about that.

Actually, brushfire-scalding only has an example dependency on brushfire-serialization so we could even break that by making the example something we put somewhere else.

avibryant commented 8 years ago

LGTM.