Pull deduplicateEvents out of NodeActor.processEvents.

Andy-42 commented 2 years ago

This change extracts a function for deduplicating events out of NodeActor.processEvents.

Since this code is likely to be called frequently, minimizing heap allocations could have performance benefits. This implementation attempts to minimize allocations by:

Eliminating the pair of reverse operations completely (as noted in a TODO in the original),
Don't build the Sets that track whether events have been seen before, and
Don't copy the mutable ArrayBuffer before returning it.

Since the Sets that track state are not built, deduplication needs to linear search the later events. This could be problematic for very large sequences of event. That would have to be a very large sequence of events to be significant.

It returns a mutable.ArrayBuffer as a Seq without coping. Since this is only used internally to NodeActor, there is no possibility for this mutable state to escape, so there is no need to do the defensive copy.

Andy-42 commented 2 years ago

I realize that I actually do the wrong thing here. I am comparing NodeChangeEvent for equality, but it needs to compare on edge, key, etc. I'll fix that.

Andy-42 commented 2 years ago

I fixed the implementation of NodeActor.deduplicateEvents so that it reproduces the original behaviour accurately. This is more verbose than the original, but I think it reads more like a specification.

This should definitely have some unit testing!

Andy-42 commented 2 years ago

There is a much smaller change that might improve performance without any logical change in the code at all. The Sets es, ps and min are all immutable sets. Each time we +=an element, this is going to create a new Set instance. Even if the immutable set implementation is clever about patching and reusing the existing set, this is not going to be as efficient as adding elements to a mutable set.

These sets are definitely not going to escape the scope, using mutable sets in this context is justified.

I would do that, along with pulling this out into a separate method so that it can be unit tested.

Andy-42 commented 2 years ago

In trying to understand the heap allocations in this code, I noticed something that I would like to drill into at some point. The parameter in question is defined as events: Seq[NodeChangedEvent]. I had inferred that the argument would actually be a List (by looking at all the calls to this method).

The pattern matching on events as a way of iterating over it would perform well if it is a List, and less well if it is a Vector. In general, understanding the heap allocations requires knowing what the implementation is.

I'm sure there are other issues that come into play (e.g., Vector co-locates in cache better, less pointer chasing).

Is that an argument for making a parameter like events specifically List or Vector - at least in a hot code path like this? You could probably optimize it either way, but is using Seq a barrier to reasoning about performance?

Andy-42 commented 2 years ago

Thanks for all the comments. I really only intended this PR for the purpose of discussion. Maybe there is something in it, but I don't think it is justified at this point.

I'm going to close this PR and do the much simpler change (using mutable sets and adding unit tests).

Andy-42 commented 2 years ago

I started over and extracted a dedupEvents function out of the body of NodeActor.processEvents, this time keeping the existing algorithm (i.e., using Sets instead of searching through later events). The only change that I made was to eliminate the two reverses.

  def dedupEvents(events: Seq[NodeChangeEvent],
                  eventTime: () => EventTime,
                  hasEffect: NodeChangeEvent => Boolean
                 ): Seq[WithTime] = {

    var es: Set[HalfEdge] = Set.empty
    var ps: Set[Symbol] = Set.empty
    var ft: Option[QuineId] = None
    var mih: Set[QuineId] = Set.empty

    def anOverridingEventOccursLater(event: NodeChangeEvent): Boolean = event match {
      case EdgeAdded(ha) => if (es.contains(ha)) true else { es += ha; false }
      case EdgeRemoved(ha) => if (es.contains(ha)) true else { es += ha; false }
      case PropertySet(k, _) => if (ps.contains(k)) true else { ps += k; false }
      case PropertyRemoved(k, _) => if (ps.contains(k)) true else { ps += k; false }
      case MergedIntoOther(id) => if (ft.isDefined) false else { ft = Some(id); false }
      case MergedHere(o) => if (mih.contains(o)) false else { mih += o; false }
    }

    val indexableEvents = events.toIndexedSeq

    @tailrec
    def accumulate(r: List[WithTime] = Nil, i: Int = indexableEvents.length - 1): Seq[WithTime] =
      if (i < 0)
        r
      else {
        val e = indexableEvents(i)
        if (!anOverridingEventOccursLater(e) && hasEffect(e))
          accumulate(r = WithTime(e, eventTime()) :: r, i = i - 1)
        else
          accumulate(r = r, i = i - 1)
      }

    accumulate()
  }

For a large number of events, converting to an IndexedSeq makes a difference, and for a small number of events I couldn't see any difference. That .toIndexedSeq is approximately the equivalent of the first reverse, so this way of doing it achieves the original goal in the TODO of eliminating one of the reverses.

I didn't compare this implementation to the original, but I don't see any indication that there is a compelling performance improvement by eliminating one reverse. This seems like premature optimization.

A better motivation for pulling this function out of NodeActor (and into a companion object) is that it enables unit testing.

thatdot / quine

Pull deduplicateEvents out of NodeActor.processEvents. #11