leeoniya / uFuzzy

A tiny, efficient fuzzy search that doesn't suck
MIT License
2.62k stars 47 forks source link

How to match addresses? #57

Closed 1ngfraga closed 7 months ago

1ngfraga commented 7 months ago

I have this list of addresses

[ "27 Glen Road, Caldercruix, Airdrie, Scotland, ML6 7PZ", "93 Chudleigh Road, Twickenham, United Kingdom, TW2 7QY", "15 Bromar Road, London, England, SE5 8DL", "8 Langland Gardens, London, NW3 6PY", "Marvan Court, 1 Waldegrave Road, Teddington, England, TW11 8LZ", "Ebble Edge Blandford Road, Coombe Bissett, Salisbury, England, SP5 4LH", "27 Glen Road, Caldercruix, Airdrie, Scotland, ML6 7PZ", "Basement Floor, 459 Finchley Road, London, United Kingdom, NW3 6HN", "Hazlemere, 70 Chorley New Road, Bolton, Greater Manchester, England, BL1 4BY", "55 East Budleigh Road, Budleigh Salterton, England, EX9 6EW", "Ladysmith House, High Street, Sidmouth, United Kingdom, EX10 8LN", "6/7 West Street, Farnham, Surrey, England, GU9 7DN", "59 Kingsbury Road, London, United Kingdom, NW9 7HU", "Unit B10 Kestrel Court Harbour Road, Portishead, Bristol, England, BS20 7AN", "20 Oak Drive, Nuthall, Nottingham, England, NG16 1FJ", "Lynn Garth, Gillinggate, Kendal, Cumbria, LA9 4JB", "19 Grange Road, Aveley, South Ockendon, England, RM15 4ER", "5b Earls Gate, Bothwell, Glasgow, Scotland, G71 8BP", "55 East Budleigh Road, Budleigh Salterton, United Kingdom, EX9 6EW", "217 Luckwell Road, Bristol, England, BS3 3HD", ]

This is my needle:

"Basement Studio, 459 Finchley Rd, London NW3 6HN, United Kingdom"

still i am getting that there is no match.

What setting am i missing to get this one: "Basement Floor, 459 Finchley Road, London, United Kingdom, NW3 6HN",

?

Thanks.

leeoniya commented 7 months ago

uFuzzy isnt really meant for this type of needle.

you can find what you need using something shorter but specific. for example "finch" or "459" or "6HN" would have matched this. if you plan on searching full sentances with extra words and different spellings / abbreviations, you probably want something closer to a fulltext search with an index, not uFuzzy.

i would not recommend uFuzzy for needles with more than 5 or 6 terms.

leeoniya commented 7 months ago

addresses can often be segmented logically, so it makes a lot more sense to build a smarter index that can recognize postal codes, countries, street names, house numbers, etc.

1ngfraga commented 7 months ago

I think as for now, i will just split the words and try to match word by word and if more than 90% i will consider it a match, thanks,.