Open logannc opened 3 years ago
Example of a random python Smith-Waterman alignment:
>>> sw.align("what about supergreatfantastic theatre", "superduperfantastic theater about what").dump()
Query: 1 superdupe-rfantastic theater 27
||||| ..| .||||||||||||||| |
Ref : 12 super-greatfantastic theat-r 37
Score: 38
Matches: 22 (78.6%)
This seems clearly superior to the existing alignment algorithm.
In particular when comparing the following:
>>> sw.align("what about superduperfantastic", "superduperfantastic about what").dump()
Query: 1 superduperfantastic 19
|||||||||||||||||||
Ref : 12 superduperfantastic 30
Score: 38
Matches: 19 (100.0%)
Mismatches: 0
CIGAR: 19M
>>> partial_ratio("what about superduperfantastic", "superduperfantastic about what")
63
See https://github.com/maxbachmann/rapidfuzz/issues/13 for generous details from another fuzzywuzzy compatible project author.
Essentially,
partial_ratio
attempts to align strings optimally, then take the ratio of the aligned string subsets. The method of alignment used in fuzzywuzzy is.. pretty bad, actually.Implement Smith-Waterman for sequence alignment to power
partial_ratio
. Legacypartial_ratio
behaviour will be relegated to a compatibility function (which, if there are more, might be put into a compatibility module).