Execution efficiency? - Githubissues

Where n represents the sum of the lengths of input set(s):

Currently:

	Average	Worst
Uniq	n	n
Intersect	n	n
Diff	n	n
Union	n+nlogn	n+n^2 *
SymDiff	3n+nlogn	3n+n^2 *
Is*	n	n

* Uses sort.Sort (quicksort) internally (proof of concept).

Diff, Intersect and the Is* functions have lower than n average case performance--the table is conservative. Is* functions have exit early mechanisms, so best possible case is already O(1).

Once Union and SymDiff are given proper implementations, and applicable algorithms are augmented to use binary search when runs are detected:

	Average	Worst
Uniq	n	n
Intersect	logn	n
Diff	logn	n
Union	logn	n
SymDiff	logn	n
Is*	logn	n

Average case assumes runs.

If the sort.Interface implementations are slice based (which they usually will be), this package has a considerably lower constant factor (much faster) than map-based set implementations. Without radical changes to Go's map internals, it's very unlikely (effectively impossible due to Go's map iteration semantics) for any map-based implementation of the core operations (the operations listed in the tables) beat the planned in-place merge algorithms listed in the second table. All of this package's core algorithms are non-allocating.

To my knowledge, no map-based set package currently listed in godoc.org is able to exceed these current implementations of set.Inter and set.Diff in speed, including the code-generated approaches.

A custom ordered-map atop Go may be a reasonable alternative to this package in terms of performance, though I suspect it would be difficult or impossible to do so in a type-safe way without code generation (conversely, this package is already type-safe and thus has no need for or benefit from code-generation).

Apply uses a cache-conservative concurrent approach with overheads bounded by Go's runtime scheduler. On my quad-core amd64 test system with a compiler version approaching Go 1.5, Apply provides slightly better than a factor-of-two increase in throughput compared to naive, non-concurrent algorithms. There are possible improvements that can be explored, such as:

making it even more cache aware (less elegant use of goroutines, the benefit of which will decrease with scheduler improvements).
getting rid of cache awareness (this would require Op to be commutative in addition to the current associative requirement) -- this would allow non-adjacent ready pairs sets to be processed, at the expense of extra overhead moving the sets to be adjacent.

xtgo / set

Execution efficiency? #1