sgottsch / eventkg

The EventKG is a novel multilingual resource incorporating event-centric information extracted from several large-scale knowledge graphs such as Wikidata, DBpedia and YAGO, as well as less structured sources such as the Wikipedia Current Events Portal and Wikipedia event lists in five languages. The EventKG is an extensible event-centric resource modeled in RDF. It relies on Open Data and best practices to make event data spread across different sources available through a common representation and reusable for a variety of novel algorithms and real-world applications.
http://eventkg.l3s.uni-hannover.de/
46 stars 11 forks source link

Data pipeline - Step 2 throwing an error +Readme clarification request #1

Open nishkalavallabhi opened 6 years ago

nishkalavallabhi commented 6 years ago

I am trying to follow the instructions. I am getting the following error at "Step 2: Data Download"

My data path in the config file was: /Users/username/Downloads/EventKG/data/ (Obviously, I masked username). I understood the instructions in a way that indicated that this step is really where all data is downloaded. However, the exception makes me think I should actually already have a list of dump files?

This is how it looks after I created an executable jar by exporting the Pipeline class.


java -jar Pipeline.jar config.txt 1 Step 1: Download files. java.io.FileNotFoundException: /Users/username/Downloads/EventKG/data/raw_data/wikipedia/en/dump_file_list.txt (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.(FileOutputStream.java:213) at java.io.FileOutputStream.(FileOutputStream.java:101) at java.io.PrintWriter.(PrintWriter.java:184) at de.l3s.eventkg.util.FileLoader.getWriter(FileLoader.java:161) at de.l3s.eventkg.pipeline.RawDataDownLoader.downloadWikipediaFiles(RawDataDownLoader.java:177) at de.l3s.eventkg.pipeline.RawDataDownLoader.downloadFiles(RawDataDownLoader.java:159) at de.l3s.eventkg.pipeline.Pipeline.download(Pipeline.java:118) at de.l3s.eventkg.pipeline.Pipeline.main(Pipeline.java:71) Exception in thread "main" java.lang.NullPointerException at de.l3s.eventkg.pipeline.RawDataDownLoader.downloadWikipediaFiles(RawDataDownLoader.java:202) at de.l3s.eventkg.pipeline.RawDataDownLoader.downloadFiles(RawDataDownLoader.java:159) at de.l3s.eventkg.pipeline.Pipeline.download(Pipeline.java:118) at de.l3s.eventkg.pipeline.Pipeline.main(Pipeline.java:71)


At that point, I manually created the path of /raw_data/wikipedia/en within the data folder [Please mention this in the readme clearly], and now I run into an error with jsoup.


Exception in thread "main" java.lang.NoClassDefFoundError: org/jsoup/Jsoup at de.l3s.eventkg.pipeline.RawDataDownLoader.downloadWikipediaFiles(RawDataDownLoader.java:179) at de.l3s.eventkg.pipeline.RawDataDownLoader.downloadFiles(RawDataDownLoader.java:159) at de.l3s.eventkg.pipeline.Pipeline.download(Pipeline.java:118) at de.l3s.eventkg.pipeline.Pipeline.main(Pipeline.java:71) Caused by: java.lang.ClassNotFoundException: org.jsoup.Jsoup at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 4 more

sgottsch commented 6 years ago

I have identified the reason for your first problem. Unfortunately, the method responsible for creating the required folders was commented out. I have just updated the Pipeline class.

Currently, I cannot reproduce the problem with Jsoup. I hope your workaround of downloading Jsoup separately solved this issue.

nishkalavallabhi commented 6 years ago

Thanks for a quick response! Yes, the first problem seems solved now. Downloading jsoup fixed the second issue. However there seems to be another missing jar for apache commons compress - this is throwing an error after the downloads ran.

sgottsch commented 5 years ago

I have just run the first step of the Pipeline (after small updates on the code for decompressing and downloading schema files). Everything worked fine. I hope that the manual downloads of the missing jars helped. As I cannot reproduce your error and did not get similar complaints yet, I cannot work on that specific issue. But please let me know if you face more problems.

parthrohilla commented 5 years ago

I tried to follow the steps mentioned but ran into an error when executing step 2 : java -jar Pipeline.jar path_to_config_file.txt 1

The following error message: java.nio.file.NoSuchFileException: path_to_config_file.txt at sun.nio.fs.WindowsException.translateToIOException(Unknown Source) at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at sun.nio.fs.WindowsException.rethrowAsIOException(Unknown Source) at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(Unknown Source) at java.nio.file.Files.newByteChannel(Unknown Source) at java.nio.file.Files.newByteChannel(Unknown Source) at java.nio.file.spi.FileSystemProvider.newInputStream(Unknown Source) at java.nio.file.Files.newInputStream(Unknown Source) at java.nio.file.Files.newBufferedReader(Unknown Source) at java.nio.file.Files.readAllLines(Unknown Source) at de.l3s.eventkg.pipeline.Config.init(Config.java:57) at de.l3s.eventkg.pipeline.Pipeline.main(Pipeline.java:52) Exception in thread "main" java.lang.NullPointerException at de.l3s.eventkg.pipeline.Pipeline.main(Pipeline.java:54)

Please Help

sgottsch commented 5 years ago

You need to replace "path_to_config_file.txt" with the actual path to the file containing the configuration data. The file content should be similar what is shown in the "Configuration" section in the readme (https://github.com/sgottsch/eventkg).

parthrohilla commented 5 years ago

Hello Sir, Thanks for the reply,there was some problem with the timestamp,it worked after I updated it to a latest timestamp(the older version has been moved or updated).But it still throws some error related to WIKINEWS,as the source has been moved maybe. So It is unable to download the WIKINEWS data.Could you suggest the way out of this problem ?

Thanking You, Parth

On Thu, 14 Feb 2019 at 20:39, Simon Gottschalk notifications@github.com wrote:

You need to replace "path_to_config_file.txt" with the actual path to the file containing the configuration data. That file should look like the one shown in the "Configuration" section in the readme ( https://github.com/sgottsch/eventkg).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sgottsch/eventkg/issues/1#issuecomment-463662002, or mute the thread https://github.com/notifications/unsubscribe-auth/AchLDvLHKidCpS2O523sULvNWC47Sw9Wks5vNXw0gaJpZM4Yejiw .