fuzzy matching - Githubissues

Hi,

so here is my shot at fuzzy search. I hope I don`t embarrass myself... :)

The idea is similar to what you know from the quick panel in ST. Here is an example: The queries nor20, n2013, NordAmid, and norld3frs all match the string Nordland 2013 - Amid Fears of Releases. There are three criteria for a match:

all the letters of the query are in the string
the letters in the query are in the same order in the string
there is at least a sequence of 3 correctly matched letters (the user can change this with the seq argument)

Matches are ranked based on two criteria:

sequence of characters (e.g. for query nor, nor is better then nxoxr)
earlier matches are better (e.g. for query nor, xnor is better then xxnor)

The import function is fuzzy_search, which takes two required and three optional arguments.

query: search string
elements: list of strings, dictionaries, tulpes, or lists
(optional) key: function to access string element in dictionaries, tulpes, or lists
(optional) rank: rank the elements in the return list by quality of match (currently not supported)
(optional) seq: minimum sequence of characters to match

fuzzy_search returns a ranked list of elements that matches the query.

key has to be specified if elements is not a list of strings and key(elements[i]) has to return a string for every element in the list elements.

I am sure this can be optimized performance wise but it's pretty fast in my tests with a list of over 2000 elements and seq=3. There is also an small example at the bottom of the code (commented out). By they way you can also use this to directly filter a list of feedback dictionaries with key = lambda x: '%s - %s' % (x['title'], x['subtitle'] (in this case the search would be based on a string 'title - subtitle')

Another thing is that you have to feedback a random uid to preserve the ranking. I think it would be great to add an option random to alp.feedback, which assigns uses a random uid and set this option to False by default. Here are details about this.

Let me know if you have questions!

phyllisstein / alp

fuzzy matching #8