Add better graph stats outputs & new input option

benwtrent commented 10 months ago

This adds more graph stats outputs for HNSW.

Adds connectiveness percentage for each layer starting from the graph entry point
Adds histogram output for number of connections for each layer (not just the bottom)

Additional adjust the following:

Adds new randomCommits input for aggressively testing merging by randomly calling iw.commit()
Allows for forceMerge to run even without reindex. This is helpful for benchmarking & testing the differences between multiple segments and then force-merging them to a single graph.

benwtrent commented 10 months ago

But only does the by layer connectiveness tests, seems like a good enough test to me.

benwtrent commented 10 months ago

Here is an example printout for a single graph:

Graph level=4 size=2, Fanout min=1, mean=1.00, max=1
%   0  10  20  30  40  50  60  70  80  90 100
    0   1   1   1   1   1   1   1   1   1   1   1   1   1   1
Graph level=3 size=19, Fanout min=4, mean=6.00, max=9
%   0  10  20  30  40  50  60  70  80  90 100
    0   4   4   5   5   5   6   6   8   8   9
Graph level=2 size=334, Fanout min=12, mean=15.88, max=16
%   0  10  20  30  40  50  60  70  80  90 100
    0  16  16  16  16  16  16  16  16  16  16
Graph level=1 size=5548, Fanout min=16, mean=16.00, max=16
%   0  10  20  30  40  50  60  70  80  90 100
    0  16  16  16  16  16  16  16  16  16  16
Graph level=0 size=87000, Fanout min=6, mean=26.72, max=32
%   0  10  20  30  40  50  60  70  80  90 100
    0  16  20  23  27  32  32  32  32  32  32
Graph level=4 size=2, connectedness=1.00
Graph level=3 size=19, connectedness=1.00
Graph level=2 size=334, connectedness=1.00
Graph level=1 size=5548, connectedness=1.00
Graph level=0 size=87000, connectedness=1.00

mikemccand commented 9 months ago

Cool, thanks @benwtrent! I think we should merge https://github.com/mikemccand/luceneutil/pull/236 too? It's draft now, I'm not sure why ... progress not perfection!

mikemccand / luceneutil

Add better graph stats outputs & new input option #253