larsga / Duke

Duke is a fast and flexible deduplication engine written in Java
Apache License 2.0
615 stars 193 forks source link

Couldn't able to RUN duke in Windows #221

Closed cyberwhicker closed 8 years ago

cyberwhicker commented 8 years ago

I wan to run duke to do record linkage on mySQL db. I am stuck in the starting point. this is what i did: 1) set my classpath variables with the jars provided 2) ran the simple statement to do the matching java no.priv.garshol.duke.Duke --showmatches dogfood-sparql.xml 3) It resulted in following error: Exception in thread "main" java.lang.RuntimeException: java.net.UnknownHostException: data.semanticweb.org at no.priv.garshol.duke.utils.SparqlClient.getResponse(SparqlClient.java:54) at no.priv.garshol.duke.utils.SparqlClient.execute(SparqlClient.java:30) at no.priv.garshol.duke.datasources.SparqlDataSource.runQuery(SparqlDataSource.java:78) at no.priv.garshol.duke.datasources.SparqlDataSource$SparqlIterator.fetchNextPage(SparqlDataSource.java:119) at no.priv.garshol.duke.datasources.SparqlDataSource$SparqlIterator.(SparqlDataSource.java:92) at no.priv.garshol.duke.datasources.SparqlDataSource$TripleModeIterator.(SparqlDataSource.java:143) at no.priv.garshol.duke.datasources.SparqlDataSource.getRecords(SparqlDataSource.java:63) at no.priv.garshol.duke.Processor.deduplicate(Processor.java:199) at no.priv.garshol.duke.Duke.main_(Duke.java:166) at no.priv.garshol.duke.Duke.main(Duke.java:36) Caused by: java.net.UnknownHostException: data.semanticweb.org at java.net.AbstractPlainSocketImpl.connect(Unknown Source) at java.net.PlainSocketImpl.connect(Unknown Source) at java.net.SocksSocketImpl.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at java.net.Socket.connect(Unknown Source) at sun.net.NetworkClient.doConnect(Unknown Source) at sun.net.www.http.HttpClient.openServer(Unknown Source) at sun.net.www.http.HttpClient.openServer(Unknown Source) at sun.net.www.http.HttpClient.(Unknown Source) at sun.net.www.http.HttpClient.New(Unknown Source) at sun.net.www.http.HttpClient.New(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.connect(Unknown Source) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source) at no.priv.garshol.duke.utils.SparqlClient.getResponse(SparqlClient.java:52) ... 9 more

Please help me to run the tool. If possible, please elaborate on stepwise usage.

larsga commented 8 years ago

It looks like you've taken one of the example config files and forgotten to remove the SPARQL data source, so that when you run Duke it's trying to download RDF data from data.semanticweb.org instead going to your MySQL database. So look for a <sparql> element way down in the XML file.

cyberwhicker commented 8 years ago

I tried modifying the xmlconfig file, but now a different problem is coming. Please help me to run the tool. I am a beginner to the tool. Following is the exception coming:

java no.priv.garshol.duke.Duke --progress --linkfile=dogfood-test.txt dogfood-sparql.xml** Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/lucene/analysis/core/KeywordAnalyzer at no.priv.garshol.duke.ConfigurationImpl.createDatabase(ConfigurationImpl.java:105) at no.priv.garshol.duke.Processor.(Processor.java:55) at no.priv.garshol.duke.Duke.main_(Duke.java:93) at no.priv.garshol.duke.Duke.main(Duke.java:36) Caused by: java.lang.ClassNotFoundException: org.apache.lucene.analysis.core.KeywordAnalyzer at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 4 more

thanks.

larsga commented 8 years ago

You have to put the Lucene jar files on the classpath. Both lucene-core-4.0.0.jar and lucene-analyzers-common-4.0.0.jar

larsga commented 8 years ago

Did you solve it?

larsga commented 8 years ago

Closing this as probably irrelevant by now.

aparna2494 commented 7 years ago

Hi, I am facing this same issue as mentioned by Ashish.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/lucene/doc ument/Fieldable at no.priv.garshol.duke.Configuration.createDatabase(Configuration.java: 106) at no.priv.garshol.duke.Processor.(Processor.java:48) at no.priv.garshol.duke.Duke.main_(Duke.java:87) at no.priv.garshol.duke.Duke.main(Duke.java:38) Caused by: java.lang.ClassNotFoundException: org.apache.lucene.document.Fieldabl e at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) ... 4 more I have added lucene core and lucene analyzers to my classpath. Thjanks, Aparna