Closed lapplislazuli closed 4 years ago
Jaccard Similarity would be an easy one, but seems to be rather weak. It is:
Size(Intersection of words) / Size(Union of Words)
Cosine Similiarity https://medium.com/@sumn2u/cosine-similarity-between-two-sentences-8f6630b0ebb7 would be much better.
To pick most redundant, but also distinct sentences, somehow I need to compare every sentence to every already chosen sentence.
Proposed Solution There should be atleast one function which gets the distance of two paths/sentences.
Then there should be a function which somewhat weights the metric-score with the distance to already chosen sentences.
Possible Problems: Maybe it's hard to make a nice, functional solution for it.
Related Issues: This is a subtask for #11
Additional Context: There are many ways to compare sentence-similiarity. One Example Article