Instrument and benchmark MongoDB using cluster collections

andre-senna commented 3 months ago

(1) Add N nodes of type "Concept" named using W words of length L using an alphabet (all small caps) of size K (2 <= K <= 6). For instance, possible nodes considering W = 3 , L = 4 and K = 5 would be:

(Concept "aabb abed bbbb") (Concept "abcd bcde aaee") (Concept "bbbe edcb eeaa") ...

Make sure a given name is used only once

(2) Add a link of type TokenSimilarity between all pair of nodes which share at least one common word in their names. These links should also have a field "strength" which is a float calculated using (number of common words) / W

For instance, considering the same parameters above: abcd abab aaab and abab cbba cccc would have such a link with strength = 1/3, because abab appears in both names.

(3) Add a link of type Similarity between all pair of nodes which share at least one common letter in the same position in their names. These links should also have a field "strength" which is a float calculated using (number of common letters) / (W * L)

For instance, considering the same parameters above: abcd abcd abcd and ebab aeee bacd would have such link with strength 4 / 12 because there are 4 matches as shown below:

abcd abcd abcd
ebab aeee bacd
 ^   ^      ^^

(4) We should have a test script instrumenting the calls to measure execution time os queries.

(5) In such test script, we need to implement different types of queries to use different types os indexes we have in MongoDB

(6) Each query should be called a number of times passing a randomized set of parameters inside a series of loops where all parameters are also varying in different ranges. For instance:

NUM_TESTS = 10

for N = {100, 1000, 10000}
    for W = {2, 3, 5, 10}
        for L = {2, 5, 10}
            for K = {2, 4, 6}
                create nodes
                create links
                for i = 1 to NUM_TESTS
                    randomly select a valid parameter for query 1
                    run query 1 
                    randomly select a valid parameter for query 2
                    run query 2
                    randomly select a valid parameter for query ...
                    run query ...

(7) Collect execution time of each query call and report the averages properly.

andre-senna commented 2 months ago

Required by https://github.com/singnet/das/issues/91

eddiebrissow commented 2 months ago

Done https://github.com/singnet/das-query-engine/pull/302 https://github.com/singnet/das-query-engine/pull/300

singnet / das-atom-db

Instrument and benchmark MongoDB using cluster collections #150