Sentence-Similiarity - Githubissues

lapplislazuli commented 4 years ago

To pick most redundant, but also distinct sentences, somehow I need to compare every sentence to every already chosen sentence.

Proposed Solution There should be atleast one function which gets the distance of two paths/sentences.

Then there should be a function which somewhat weights the metric-score with the distance to already chosen sentences.

Possible Problems: Maybe it's hard to make a nice, functional solution for it.

Related Issues: This is a subtask for #11

Additional Context: There are many ways to compare sentence-similiarity. One Example Article

lapplislazuli commented 4 years ago

Jaccard Similarity would be an easy one, but seems to be rather weak. It is:

Size(Intersection of words) / Size(Union of Words)

lapplislazuli commented 4 years ago

lapplislazuli / Hopinosis