larsga / Duke

Duke is a fast and flexible deduplication engine written in Java
Apache License 2.0
614 stars 194 forks source link

The config file for JSON Data source #256

Open cwichka opened 6 years ago

cwichka commented 6 years ago

Does anyone know the structure of the config file for JSON data source for a deduplication issue ? My deduplication project works perfectly fine on a CSV file but when i've tried to do same thing on a JSON file i didn't find an example nor a solution to it !!

Thanks in advance !

Anishx commented 5 years ago

https://github.com/larsga/Duke/issues/258

uderline commented 5 years ago

I would like to add that the json to parse is not valid. The format (cf. the test file) is

{"entry": "1", "name": "John", "last_name": "Doe"}
{"entry": "2", "name": "John", "last_name": "Doe"}
{"entry": "3", "name": "John", "last_name": "Doe"}

No [ ] nor , out of each entry.

Note added to the data source wiki page.