ulf1 / kshingle

Split strings into (character-based) k-shingles
Apache License 2.0
4 stars 1 forks source link

add containment similarity metric #17

Closed ulf1 closed 3 years ago

ulf1 commented 3 years ago
J(A, B) = | INTERSECTION(A,B) | / | UNION(A,B) |
C(A,B) = | INTERSECTION(A,B) | / | A | 

Assume A=Q is a shingleset that we want to query against a database full of shinglesets x_i (i.e. B=x_i for specific database example i).

Literature:

ulf1 commented 3 years ago
def containment(Q: set, X: set) -> float:
    u = float(len(Q.intersection(X)))
    return u / len(Q)
ulf1 commented 3 years ago

why?