(1) Add N nodes of type "Concept" named using W words of length L using an alphabet (all small caps) of size K (2 <= K <= 6). For instance, possible nodes considering W = 3 , L = 4 and K = 5 would be:
(2) Add a link of type TokenSimilarity between all pair of nodes which share at least one common word in their names. These links should also have a field "strength" which is a float calculated using (number of common words) / W
For instance, considering the same parameters above: abcd abab aaab and abab cbba cccc would have such a link with strength = 1/3, because abab appears in both names.
(3) Add a link of type Similarity between all pair of nodes which share at least one common letter in the same position in their names. These links should also have a field "strength" which is a float calculated using (number of common letters) / (W * L)
For instance, considering the same parameters above: abcd abcd abcd and ebab aeee bacd would have such link with strength 4 / 12 because there are 4 matches as shown below:
abcd abcd abcd
ebab aeee bacd
^ ^ ^^
(4) We should have a test script instrumenting the calls to measure execution time os queries.
(5) In such test script, we need to implement different types of queries to use different types os indexes we have in MongoDB
(6) Each query should be called a number of times passing a randomized set of parameters inside a series of loops where all parameters are also varying in different ranges. For instance:
NUM_TESTS = 10
for N = {100, 1000, 10000}
for W = {2, 3, 5, 10}
for L = {2, 5, 10}
for K = {2, 4, 6}
create nodes
create links
for i = 1 to NUM_TESTS
randomly select a valid parameter for query 1
run query 1
randomly select a valid parameter for query 2
run query 2
randomly select a valid parameter for query ...
run query ...
(7) Collect execution time of each query call and report the averages properly.
(1) Add
N
nodes of type "Concept" named usingW
words of lengthL
using an alphabet (all small caps) of sizeK
(2 <=K
<= 6). For instance, possible nodes consideringW
= 3 ,L
= 4 andK
= 5 would be:(Concept "aabb abed bbbb") (Concept "abcd bcde aaee") (Concept "bbbe edcb eeaa") ...
Make sure a given name is used only once
(2) Add a link of type TokenSimilarity between all pair of nodes which share at least one common word in their names. These links should also have a field "strength" which is a float calculated using (
number of common words
) /W
For instance, considering the same parameters above:
abcd abab aaab
andabab cbba cccc
would have such a link with strength = 1/3, becauseabab
appears in both names.(3) Add a link of type Similarity between all pair of nodes which share at least one common letter in the same position in their names. These links should also have a field "strength" which is a float calculated using (
number of common letters
) /(W * L)
For instance, considering the same parameters above:
abcd abcd abcd
andebab aeee bacd
would have such link with strength 4 / 12 because there are 4 matches as shown below:(4) We should have a test script instrumenting the calls to measure execution time os queries.
(5) In such test script, we need to implement different types of queries to use different types os indexes we have in MongoDB
(6) Each query should be called a number of times passing a randomized set of parameters inside a series of loops where all parameters are also varying in different ranges. For instance:
(7) Collect execution time of each query call and report the averages properly.