scify / JedAIToolkit

An open source, high scalability toolkit in Java for Entity Resolution.
http://jedai.scify.org
Apache License 2.0
209 stars 47 forks source link

Dirty datasets in CSV format #47

Closed florisheijmans closed 2 years ago

florisheijmans commented 3 years ago

Hi I was wondering if you have the dirty datasets available in CSV format? Otherwise I can just create a quick script that reads the JSO files and convert them myself, but I figured there is no harm in asking first! Thanks in advance.

vromaniello commented 2 years ago

Hi @florisheijmans , I have a similar problem with JedAI-gui as follow

image

Maybe my .csv files (entity profile and gold-truth) are not well formatted. An example follows.

Entity profile: restaurant.csv

image

Gold-truth file: restaurant_gold.csv

image

I also tried to insert single quotes etc but with no success.

I can't find any dataset examples in .csv. can you help me? Thank you very much

gpapadis commented 2 years ago

Hi!

@florisheijmans: somehow I missed your ticket. I apologize for not replying.

@vromaniello: your data looks fine. Are the rest of the CSV reader parameters properly configured? You can send me a sample of your data (~50 rows) to try them myself.

vromaniello commented 2 years ago

Hi @gpapadis, my tests succedeed with JedAI's web app through Docker and following "Abt-Buy" .csv examples. These are my files:

Entity profile

image

Gold-truth file

image

The csv reader and the explore button work fine. Thank you