kanwei / algorithms

Ruby algorithms and data structures. C extensions
http://kanwei.github.io/algorithms/
MIT License
2.67k stars 351 forks source link

Heap implementation is very slow #36

Closed generall closed 10 months ago

generall commented 7 years ago

Please refer to this stackoverflow question: http://stackoverflow.com/questions/13978484/kanwei-minheap-slow-ruby

rickhull commented 6 years ago

Insertion is supposed to be constant time, according to this @igrigorik blog post. It looks like it is not quite even linear, though I haven't delved too deeply. See this gist

From the gist, notably:

rickhull commented 6 years ago

I think this section makes insertion depend on N, the size of the heap:

https://github.com/kanwei/algorithms/blame/master/lib/containers/heap.rb#L80

rickhull commented 6 years ago

Check out this implementation as a point of comparison: https://github.com/rickhull/compsci/blob/master/lib/compsci/heap.rb

OUTPUT

...
790000th insert: 0.00000097 s
800000th insert: 0.00000245 s
810000th insert: 0.00000148 s
820000th insert: 0.00000100 s
830000th insert: 0.00000103 s
inserted 837191 items in 3.0 s
Run options: --seed 55035

# Running:

bench_insert     0.000085        0.000032        0.000562        0.001365       0.013642
.

Finished in 0.040194s, 24.8793 runs/s, 24.8793 assertions/s.

1 runs, 1 assertions, 0 failures, 0 errors, 0 skips

Note that this is asserting constant insertion performance, not linear, with 0.9999 error threshold. I suspect it will ultimately prove to be O(log n) at higher n. We're getting over 800,000 insertions in 3 seconds as compared to 7600 or so.

rickhull commented 6 years ago

now getting over 500k push / sec:

1510000th push: 0.00000184 s
1520000th push: 0.00000614 s
1530000th push: 0.00000141 s
1540000th push: 0.00000294 s
1550000th push: 0.00000199 s
1560000th push: 0.00000876 s
1570000th push: 0.00000134 s
pushed 1577943 items in 3.0 s

still a heap with 1577958 items? YES - 1.633 sec
alexkalderimis commented 3 years ago

While the heap implementation doesn't have the best throughput on insertions (regrettable, for sure), it does have a bunch of features that rb_heap and others don't, such as re-scoring elements and removing elements, and merging heaps.

BMorearty commented 2 years ago

I've just pushed a PR that fixes push performance. It used to be O(n) but once this is merged it will be O(1).