stripe-archive / brushfire

Distributed decision tree ensemble learning in Scala
Other
391 stars 50 forks source link

Add new `sumLeaves` method (or similar) #40

Closed tixxit closed 8 years ago

tixxit commented 9 years ago

We currently get only 1 leaf in leafFor, but it is possible we may want to allow paths to diverge during tree evaluation (eg feature is missing). To do this, we should add a new sumLeaves method which allows us traverse down multiple paths and aggregate the results using the prior probabilities of each diverging path.

During tree traversal, we'd always follow edges whose predicates return true. We would also allow nodes to have multiple true predicates. For each true edge, we'd determine their prior probabilities relative to all true edges and use it to scale the result returned for each edge, then sum the result. The method may look something like:

def sumLeaves[C[_], A: Numeric](row: Map[K, V])(f: Leaf => C[A])(implicit vs: VectorSpace[A, C]): C[A] = ???
tixxit commented 8 years ago

This is covered by the TreeTraversal work.