intuit / fuzzy-matcher

A Java library to determine probability of objects being similar.
Apache License 2.0
226 stars 69 forks source link

Address matching: street containing hyphens #47

Closed ashakirin closed 3 years ago

ashakirin commented 3 years ago

Currently these addresses are not match at all: Case 1:

Case 2:

Are there any option to improve the algorithm in order to consider space and hyphen as equal separators?

Regards, Andrei.

manishobhatia commented 3 years ago

Hi Andrei,

Yes this is possible by overriding the pre-processing function to replace custom separators from data Here is an example

String value = "123_XYZ_Ltd_st, TX";
Function<String, String> customPreProcessing = (str -> str.replaceAll("-", " "));
customPreProcessing = customPreProcessing.andThen(PreProcessFunction.addressPreprocessing());

Element element = new Element.Builder().setType(ADDRESS)
    .setPreProcessingFunction(customPreProcessing)
    .setValue(value)
    .createElement();

So as you create the Element object, pass in the customPreProcessing function, which will standardize the data before the algorithm considers them for match

Hope this helps

Thanks, Manish