intuit / fuzzy-matcher

A Java library to determine probability of objects being similar.
Apache License 2.0
226 stars 69 forks source link

comparing two string with different dimension #50

Closed giorgionegro closed 3 years ago

giorgionegro commented 3 years ago

i have two string to compare but one is to long so i get no match {"jojo","le-bizzarre-avventure-di-jojo-diamond-is-unbreakable"}

im using this matching matchService.applyMatchByGroups(documentList) with this type of document

   for (String str : input) {
                document2 = new Document.Builder(str)
                        .addElement(new Element.Builder<String>().setValue(str).setType(ElementType.NAME).createElement())
                        .setThreshold(0)
                        .createDocument();
                documentList.add(document2);
            }
manishobhatia commented 3 years ago

Hi,

The reason the match is not showing up is mainly because the string are delimited by "-" instead of space. I added a pre-processing function, that modified the input before matching.

String[] input = {"jojo","le-bizzarre-avventure-di-jojo-diamond-is-unbreakable"};

        Function<String, String> customPreProcessing = (str -> str.replaceAll("-", " "));
        List<Document> documentList = new ArrayList<>();
        for (String str : input) {
            Document document2 = new Document.Builder(str)
                    .addElement(
                            new Element.Builder<String>()
                                    .setValue(str)
                                    .setType(ElementType.NAME)
                                    .setPreProcessingFunction(customPreProcessing)
                                    .setThreshold(0)
                                    .createElement())
                    .setThreshold(0)
                    .createDocument();
            documentList.add(document2);
        }
giorgionegro commented 3 years ago

thank you,now it works