marklogic-community / smart-mastering-core

Smart Mastering services and libraries for MarkLogic. Documentation: https://marklogic-community.github.io/smart-mastering-core/
Other
11 stars 12 forks source link

Unwanted notification when records are merged #232

Open xbonnamy-marklogic opened 6 years ago

xbonnamy-marklogic commented 6 years ago

Summary

As part of the process-records:process-match-and-merge process there should not be notifications (based on a lower score threshold than the one defined for merging) for records that are merged.

Description

I have a case where I process-records:process-match-and-merge my records and get a merge for 2 records. This is the expected behavior as the match score is above the defined threshold for merge. My thresholds are defined like this:

  <thresholds>
    <threshold above="9" label="Possible Match" action="notify"/>
    <threshold above="14" label="Likely Match" action="notify"/>    
    <threshold above="19" label="Definitive Match" action="merge"/>
  </thresholds>

This is also confirmed by using verifying the match score using matcher:find-document-matches-by-options-name with one of the candidate record for merging. As shown below:

<result uri="/customer/13923aa4-10ca-4974-97b8-02151a244b58" index="1" score="23" threshold="Definitive Match" action="merge">
  <matches>
    <match>fn:doc("/customer/13923aa4-10ca-4974-97b8-02151a244b58")/envelope/instance/Customer/text("firstname")</match>
   <match>fn:doc("/customer/13923aa4-10ca-4974-97b8-02151a244b58")/envelope/instance/Customer/text("email")</match>
  </matches>
</result>

But as part of the processMatchAndMerge I also get a notification for a "Likely Match" for the records being merged as below:

<sm:notification xmlns:sm="http://marklogic.com/smart-mastering">
  ...
  <sm:threshold-label>Likely Match</sm:threshold-label>
  <sm:document-uris>
    <sm:document-uri>/customer/13923aa4-10ca-4974-97b8-02151a244b58</sm:document-uri>
    <sm:document-uri>/customer/536</sm:document-uri>
  </sm:document-uris>
</sm:notification>

In this case, as part of the process-records:process-match-and-merge there should not be a notification for the records that are merged.

Technical information

Match Options

In this case match is based on 3 properties : firstname, lastname and email with 2 expand scoring as below:

    <expand property-name="firstname" algorithm-ref="thesaurus" weight="5">
      <thesaurus>/thesaurus/first-name-synonyms.xml</thesaurus>
      <distance-threshold>50</distance-threshold>
    </expand>
    <expand property-name="lastname" algorithm-ref="dbl-metaphone" weight="7">
      <dictionary>/dictionary/name.xml</dictionary>
      <distance-threshold>50</distance-threshold>
    </expand>

MarkLogic environment

MarkLogic 9.0-7, DHF 4.0.1, SmartMastering 1.1.1

dmcassel commented 6 years ago

You called process-match-and-merge on one document and it sounds like there are two other documents that were found to be similar. One of them hit the threshold for merging and one the threshold for notification. Since the original document (the one whose URI you passed to process-match-and-merge got merged, the notification has the merged URI instead of the original one. Is that right?

If not, sample data would be useful.

xbonnamy-marklogic commented 6 years ago

For customer/536 document, /customer/13923aa4-10ca-4974-97b8-02151a244b58 is similar and vice versa. The merged document has the following in its header: "merges": [ { "document-uri": "/customer/536" }, { "document-uri": "/customer/13923aa4-10ca-4974-97b8-02151a244b58" } ], I called process-match-merge on my mdm-content collection which includes both docs. I hope it clarifies.

ryanjdew commented 5 years ago

I believe this is fixed with the 1.2.1 release of Smart Mastering. If this could be verified with your original data and system, that would be appreciated.

xbonnamy-marklogic commented 5 years ago

Sorry for this late test. Gave it a try today but the problem remains