luozhouyang / python-string-similarity

A library implementing different string similarity and distance measures using Python.
MIT License
992 stars 127 forks source link

Added New similarity algorithm using Gensim Library #8

Closed thepylot closed 5 years ago

thepylot commented 5 years ago

Gensim is billed as a Natural Language Processing package that does ‘Topic Modeling for Humans’. But it is practically much more than that. It is a leading and a state-of-the-art package for processing texts, working with word vector models (such as Word2Vec, FastText etc) and for building topic models.

-Updated READ.me -Updated requirements.txt

luozhouyang commented 5 years ago

Good idea! But could you please organize your code to a class, something like GensimSimilarity?

thepylot commented 5 years ago

Good idea! But could you please organize your code to a class, something like GensimSimilarity? I updated the file you can check now :)

luozhouyang commented 5 years ago

Here are the exceptions:

exception
thepylot commented 5 years ago

workdir folder should locate same directory with program file. Please create new folder named "workdir" and it should locate same direcorty with program file.

On Sun, Oct 6, 2019 at 9:42 AM luozhouyang notifications@github.com wrote:

Here are the exceptions: [image: exception] https://user-images.githubusercontent.com/34032031/66270096-1fe4ed80-e882-11e9-95a5-bc2939077491.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/luozhouyang/python-string-similarity/pull/8?email_source=notifications&email_token=AKB62UFP6R55ZZGJQHUWNULQNHTNBA5CNFSM4I5B4KW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAOKMUA#issuecomment-538748496, or mute the thread https://github.com/notifications/unsubscribe-auth/AKB62UGNTJ5YJWN6ZKZH7ADQNHTNBANCNFSM4I5B4KWQ .

luozhouyang commented 5 years ago

It seems that integration with gensim is not that easy.

Both gensim and nltk are heavy libraries, and are often used in deep learning based NLP tasks.

I tend to keep this library simple and small. Let it complement these frameworks, not integrate with them.