luozhouyang / python-string-similarity

A library implementing different string similarity and distance measures using Python.
MIT License
991 stars 127 forks source link

added regex prior to profiles calculation #3

Closed ghost closed 5 years ago

ghost commented 5 years ago

This solves an issue https://github.com/luozhouyang/python-string-similarity/issues/1 The ZeroDivisionError was produced by creating a profile on a string shorter than the ngram size. The problem was caused by removing empty spaces after comparing string lengths with the ngram size (L44 in cosine.py).

luozhouyang commented 5 years ago

hi @pmalgorzata , I think we should keep spaces when calculating string similarity( or distance). Removing spaces makes no sense.