stripe-archive / brushfire

Distributed decision tree ensemble learning in Scala
Other
391 stars 50 forks source link

Place a Semigroup constraint on `prune` instead of all `Node` instances #64

Closed NathanHowell closed 4 years ago

NathanHowell commented 9 years ago

This change removes the Semigroup constraint from the Node trait and places it only on methods where it is required. This will allow a greater range of annotations to be attached to leaf nodes. Users that need to prune the tree would need to transform the annotation via mapAnnotation to a type with a Semigroup instance before pruning.

This currently slows down TreeTraversal.weightedDepthFirst and TreeTraversal.probabilisticWeightedDepthFirst because the annotation is not cached during sorting. It's not clear if these functions are normally in use. If they are I can amend the pull request to preserve sort performance.

avibryant commented 9 years ago

I defer to @tixxit on this but I'm pretty sure that caching those annotations was important (and we do indeed use them at Stripe). If you can see a way to easily preserve the sort performance that would be much preferred.

tixxit commented 8 years ago

Yeah, caching is fairly important, since traversing a path shouldn't require traversing the whole tree to compute the annotations. But, assuming type class coherence (eg just make it clear in the docs that the annotation may be cached), I'd be OK with making the annotation an AtomicReference (or @volatile + an extra check), then having the annotation method take a Semigroup[A]. Like,

  private val cachedAnnotation: AtomicReference[A] = new AtomicReference
  def annotation(implicit sg: Semigroup[A]): A = {
    var a = cachedAnnotation.get
    if (a == null) {
      cachedAnnotation.compareAndSet(null, Semigroup.sum(children.map(_._3.annotation)))
      a = cachedAnnotation.get
    }
    a
  }
tixxit commented 8 years ago

Also, sorry for the delay in answering this! I think it landed mid-vacation, while I was away from my computer.

NathanHowell commented 8 years ago

No problem, I have a few other things queued up right now but I should be back to model training in the next week or so... and should be able to work on this patch then.