Closed dschwilk closed 10 years ago
Oops, I forgot the results using fuzzy_match( ... quick=True):
print(a3.timeit(number=3))
# 21.6809160709
But that first pass with difflib could be used in any algorithm
This is great to read. Thanks!
Will
On 06/02/2014 07:03 PM, Dylan Schwilk wrote:
Oops, I forgot the results suing fuzzy_match( ... quick=True):
print(a3.timeit(number=3))
21.6809160709
But that first pass with difflib could be used in any algorithm
— Reply to this email directly or view it on GitHub https://github.com/schwilklab/taxon-name-utils/issues/3#issuecomment-44905969.
Ok, so I will start on this using the original levenshtein distnace code for matching. I'm working on code in a local branch right now and should have stuff for a pull request soon.
Note that the difflib step adds time when used with the fast levenshtein code since it is 20 times slower than the levenshtein distance calc. So I plan on removing that "quick" option.
-D
On 06/03/2014 09:54 AM, Will Pearse wrote:
This is great to read. Thanks!
Will
On 06/02/2014 07:03 PM, Dylan Schwilk wrote:
Oops, I forgot the results suing fuzzy_match( ... quick=True):
print(a3.timeit(number=3))
21.6809160709
But that first pass with difflib could be used in any algorithm
— Reply to this email directly or view it on GitHub https://github.com/schwilklab/taxon-name-utils/issues/3#issuecomment-44905969.
— Reply to this email directly or view it on GitHub https://github.com/schwilklab/taxon-name-utils/issues/3#issuecomment-44974877.
...I hadn't checked the two yet, I'd simply put in some code I'd written next to yours. Your method is so much faster than mine, I don't even see any point keeping it!
Will
On 06/03/2014 09:57 AM, Dylan Schwilk wrote:
Ok, so I will start on this using the original levenshtein distnace code for matching. I'm working on code in a local branch right now and should have stuff for a pull request soon.
Note that the difflib step adds time when used with the fast levenshtein code since it is 20 times slower than the levenshtein distance calc. So I plan on removing that "quick" option.
-D
On 06/03/2014 09:54 AM, Will Pearse wrote:
This is great to read. Thanks!
Will
On 06/02/2014 07:03 PM, Dylan Schwilk wrote:
Oops, I forgot the results suing fuzzy_match( ... quick=True):
print(a3.timeit(number=3))
21.6809160709
But that first pass with difflib could be used in any algorithm
— Reply to this email directly or view it on GitHub
https://github.com/schwilklab/taxon-name-utils/issues/3#issuecomment-44905969.
— Reply to this email directly or view it on GitHub
https://github.com/schwilklab/taxon-name-utils/issues/3#issuecomment-44974877.
— Reply to this email directly or view it on GitHub https://github.com/schwilklab/taxon-name-utils/issues/3#issuecomment-44975315.
Still room for optimization, but in the area of fast automata construction. Closed with d601a0f
Pure levenshtein distances (import Levenshtein) is much faster than fuzzywuzzy. It makes sense: fuzzywuzzy is built on top of it.
So if we want the actual strings returned, this function
is much faster