seamusabshere / fuzzy_match

Find a needle (a document or record) in a haystack using string similarity and (optionally) regular expression rules. Uses Dice's Coefficient (aka Pair Similiarity) and Levenshtein Distance internally.
MIT License
677 stars 46 forks source link

Feature request - returning array of multiple possible matches #3

Closed brycesenz closed 11 years ago

brycesenz commented 11 years ago

First, thanks so much for the helpful module - it's been incredibly useful on past applications. Currently though, I have a need to do some fuzzy matching with multiple possible results. It would be great to add that functionality to this module (with an optional "threshold" parameter).

Example use case - say a customer searches for a "shover", you could return valid options of "shovel" or "shaver" and allow him/her to specify what was meant.

I know Sphinx and other search options exist which do this, don't want to go the full route of new search functionality for something so close to what this module already does.

seamusabshere commented 11 years ago

hey, check out https://github.com/seamusabshere/fuzzy_match/tree/find_all_with_score

gem 'fuzzy_match', github: 'seamusabshere/fuzzy_match', branch: 'find_all_with_score'

you get 2 scores for every record (because sometimes pair distance aka dice's coefficient can't tell things apart)

fz = FuzzyMatch.new [...]
fz.find_all_with_score('foobar').each do |record, dice_similar, leven_similar|
  [...]
end
brycesenz commented 11 years ago

Hey Seamus, I'll try it out as soon as I can - I'm a bit busy with the holidays, but I'll give it a shot after New Years. Thanks for the prompt response!

seamusabshere commented 11 years ago

yo @brycesenz, how's this going?

seamusabshere commented 11 years ago

hey @brycesenz check out 8e11cfe0628c15b309a1f8a3137f5ba8544ed51d with :threshold option

brycesenz commented 11 years ago

Hey Seamus, sorry about the lack of response. I am visiting my sister abroad at the moment, and while I thought that I'd have time to play around with this in my spare time, I just haven't. I'll be back to work next week and happy to look at it then. Sorry I've been out of touch.

brycesenz commented 11 years ago

@seamusabshere - apologies for the delay. The project that I was working on dropped the fuzzy search feature, and I only now got back to testing it on my own. The find_all_with_score option is working great for me - is there any plan to port that to the master branch?

seamusabshere commented 11 years ago

@brycesenz it's released!