stripe-archive / brushfire

Distributed decision tree ensemble learning in Scala
Other
391 stars 50 forks source link

Refactor tree traversals #53

Closed tixxit closed 9 years ago

tixxit commented 9 years ago

This refactors tree traversals (also TraversalStrategy => TreeTraversal) to instead return a lazy Stream of matching nodes. Methods like targetFor on AnnotatedTree now require a Semigroup[T] to deal with multiple matching targets. The traversals can limit the number of nodes returned by calling limitTo(n).

This also refactors the voters a bit (and moves them to their own file in anticipation of the training/trees split).

This would supercede #52

tixxit commented 9 years ago

?r @avibryant

tixxit commented 9 years ago

Previous comment got obliterated, but I was suggesting we have Tree#targetFor and TreeTraversal#find take a Random, rather than have the random traversals capture the Random at creation. Namely, it would be a bit of a leaky abstraction, but our actual use cases wants to use a new Random per traversal. So the alternative means we'll be using Random => TreeTraversal[K, V, T, A] everywhere instead of just TreeTraversal[K, V, T, A] :\

tixxit commented 9 years ago

Friendly bump =) cc @avibryant

avibryant commented 9 years ago

@tixxit acknowledged, I'll try to look at this tomorrow.

avibryant commented 9 years ago

LGTM in general. Re Random, another option to consider would be to take an id: String which could be used to seed the Random - which I think is what we actually want, and might feel slightly more general/less leaky than directly taking Random?

tixxit commented 9 years ago

Yes - I like that more.