Closed markairwallex closed 3 years ago
Hi,
Yes the name and address dictionary can be provided externally. These are both used to pre-process the data before running a match.
To do that we will have to create a custom pre-processing function and pass it while creating the Element. Here are the steps with examples you can use
Map<String, String> newNameDict = new HashMap<String, String>() {{
put("Queen", "");
put("Third", "");
put("III", "");
}};
Create a custom function that applies this mapping to any input
Function<String, String> newNamePreProcessing = (str) -> {
return Arrays.stream(str.split("\\s+"))
.map(d -> newNameDict.containsKey(d) ? newNameDict.get(d) : d)
.collect(Collectors.joining(" "));
};
Override the pre-processing function when creating an element
String[][] input = {
{"1", "Victoria Third"},
{"2", "Queen Victoria III"},
};
List<Document> documentList = Arrays.asList(input).stream().map(contact -> {
return new Document.Builder(contact[0])
.addElement(new Element.Builder<String>().setValue(contact[1]).setType(NAME)
// Set the custom function
.setPreProcessingFunction(newNamePreProcessing)
.createElement())
.createDocument();
}).collect(Collectors.toList());
Now if this is fed to the MatchService, the name-dictionary.txt is overridden and it uses your custom function to pre-process the data
Map<Document, List<Match<Document>>> result = matchService.applyMatch(documentList);
Hi,
Yes the name and address dictionary can be provided externally. These are both used to pre-process the data before running a match.
To do that we will have to create a custom pre-processing function and pass it while creating the Element. Here are the steps with examples you can use
- In case of Name, it's just convenient to remove any titles, salutation, prefix, postfix . So this example makes a java hash-map with words mapped to empty string. (This can be read from file if you would like)
Map<String, String> newNameDict = new HashMap<String, String>() {{ put("Queen", ""); put("Third", ""); put("III", ""); }};
- Create a custom function that applies this mapping to any input
Function<String, String> newNamePreProcessing = (str) -> { return Arrays.stream(str.split("\\s+")) .map(d -> newNameDict.containsKey(d) ? newNameDict.get(d) : d) .collect(Collectors.joining(" ")); };
- Override the pre-processing function when creating an element
String[][] input = { {"1", "Victoria Third"}, {"2", "Queen Victoria III"}, }; List<Document> documentList = Arrays.asList(input).stream().map(contact -> { return new Document.Builder(contact[0]) .addElement(new Element.Builder<String>().setValue(contact[1]).setType(NAME) // Set the custom function .setPreProcessingFunction(newNamePreProcessing) .createElement()) .createDocument(); }).collect(Collectors.toList());
Now if this is fed to the MatchService, the name-dictionary.txt is overridden and it uses your custom function to pre-process the data
Map<Document, List<Match<Document>>> result = matchService.applyMatch(documentList);
Thanks for your reply, yes Override the pre-processing function can implement this. but do we have simple direct way just Override dictionary file eg provide element file path
? or do we have plan to do this, cause I think it's a useful feature for customer pre-processing. Thanks a lot in advanced!
Unfortunately there is not an easier way, but we will take this as an enhancement for our next release . Hopefully the above method will unblock you for your immediate needs.
Unfortunately there is not an easier way, but we will take this as an enhancement for our next release . Hopefully the above method will unblock you for your immediate needs.
ok, cool really expected the next release. Thanks!
I noticed that we have a name dictionary.txt。 Is any way I can override or add some mapping config in this dictionary