larsga / Duke

Duke is a fast and flexible deduplication engine written in Java
Apache License 2.0
614 stars 194 forks source link

Linking Json data source doesn't seem to work #260

Open uderline opened 5 years ago

uderline commented 5 years ago

Hi, Deduplication works fine, the records are matched. Though, using the linkage mode with the configuration given below.

<group>
  <data-source class="no.priv.garshol.duke.datasources.JsonDataSource">
    <param name="input-file" value="base.json"/>

    <column name="id" property="ID"/>
    <column name="first_name" property="first_name"/>
    <column name="last_name" property="last_name"/>
    <column name="birth" property="birth"/>
    ....
  </data-source>
</group>

<group>
  <data-source class="no.priv.garshol.duke.datasources.JsonDataSource">
    <param name="input-file" value="tolink.json"/>

    <column name="id" property="ID"/>
    <column name="first_name" property="first_name"/>
    <column name="last_name" property="last_name"/>
    <column name="birth" property="birth"/>
    ....
  </data-source>
</group>

Just in case, I put the same dataset in both sources. The data source is instantiated for sure but afterwards, no records are created.