Closed Anishx closed 5 years ago
Hi ! Have you tried the most common way ?
<data-source ...>
<column name="col1" property="name_property">
The entire xml file would look like:
<duke>
<schema>
<threshold></threshold>
<property></property>
</schema>
<data-source ...>
<param ... />
<column .... />
</data-source />
</duke>
@uderline i somehow got it working but jsons can have several keys inside with same, how to specify where to find the value in json, i have a JSON file in which the data looks like (int the file JSON.txt) below, how do i choose "id" and "name" and "extension's url" ( for example ) ? and there's no example here to demostrate that
and i tried another thing, linking 2 json files and i got a blank space in the cmd i used the below xml file XML.txt
the sample json i used JSONSAMPLE.txt
It seams like you cannot specify a specific key in a key like extension.url like the MongoDB source. If you absolutely need to have url and url in extension, I would change the name of the key.
For the config, make the properties (e.g. id, name and extension) in the <schema>
.
Then, make the columns in the <data-source>
.
These are described in the xml config file.
<schema>
<threshold>0.8</threshold>
<property type="id">
<name>ID</name>
</property>
<property>
<name>NAME</name>
<comparator>no.priv.garshol.duke.comparators.Levenshtein</comparator>
<low>0.09</low>
<high>0.93</high>
</property>
<property>
<name>URL</name>
<comparator> no.priv.garshol.duke.comparators.Levenshtein </comparator>
<low>0.04</low>
<high>0.73</high>
</property>
</schema>
<database class="no.priv.garshol.duke.databases.InMemoryDatabase">
</database>
<data-source class="no.priv.garshol.duke.datasources.JsonDataSource">
<param name="input-file" value="JSON.json" />
<column name="id" property="ID" />
<column name="name" property="NAME" />
<column name="url" property="URL" />
</data-source>
Hope that helps
@uderline but this json syntax is used for another huge application, i may have to change the json file specifically for this purpose, i suppose . . .
but the scope of this issue is closed i guess, it kinda works now . . . Thank you @uderline
Hi , Can this be used for continuous stream of Json as well ?How do we configure inthat case ?Regards,ashutosh
Sent from Yahoo Mail for iPhone
On Tuesday, October 30, 2018, 2:29 pm, Anish notifications@github.com wrote:
but the scope of this issue is closed i guess, it kinda works now . . . Thank you @uderline
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Hi @ashubitm , I guess not because the dataset is saved in memory or indexed in a Lucene DB before the dedup/linkage process starts. That's why I made myself a plugin for Elasticsearch which will link the ingested data. Put that with Logstash (for the stream) and you'll have the perfect combo ;) But that's another issue/project.
how to write config.xml for json data source
the above lines are ambiguous in the documentation. Perhaps it would be well suited with an example & there're no examples to describe this . . . Kindly please help