neosyon / SimpTextAlign

Repo for the simplified text alignment tools.
MIT License
21 stars 7 forks source link

Run the tool on Newsela #4

Closed nalfear closed 5 years ago

nalfear commented 5 years ago

I am trying to run the tool on Newsela dataset, but I think I have to change the baseDir in the code and recreate the jar file. Does the baseDir have the articles can you give me example please?

neosyon commented 5 years ago

As we mention in the main page, you can use the given .jar and provide the path to the newsela dataset. See https://github.com/neosyon/SimpTextAlign#usage

nalfear commented 5 years ago

Thank you for your reply. I have tried it, but I got this error: Exception in thread "main" java.lang.reflect.InvocationTargetException at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:567) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.NullPointerException at simplifiedTextAlignment.DatasetAlignment.AlignNewselaDataset.main(AlignNewselaDataset.java:94) ... 5 more

neosyon commented 5 years ago

Can you please share with me the command that you executed?

nalfear commented 5 years ago

C:\Users.....\SimpTextAlign-master\jars>java -jar AlignNewselaDataset.jar -i inFolder -o outFolder I left the other arguments to the default and I have 2 folders in the same directory inFolder has a sample of Newsela articles in different simplification levels and the other is empty outFolder.

neosyon commented 5 years ago

Please try to provide the inFolder and OutFolder values with absolute path, e.g.:

C:\Users.....\SimpTextAlign-master\jars>java -jar AlignNewselaDataset.jar -i C:/data/newsela/ -o C:/data/output

nalfear commented 5 years ago

I will try and just to make sure the output fplder is an empty folder and the input has text files of Newsela articles in different levels, right?

nalfear commented 5 years ago

I've tried it, but still got the same error. I think I have to change the baseDir in the source code, but is it the folder where Newsela articles stored or what?

neosyon commented 5 years ago

Please, read the whole usage section: https://github.com/neosyon/SimpTextAlign#usage

There are several command arguments which are mandatory, i.e., all the ones without {}

java -jar AlignNewselaDataset.jar -i inFolder -o outFolder -l language -s similarityStrategy -a alignmentLevel -t alignmentStrategy {-u SubLevelalignmentStrategy} {-e embeddingsTxtFile}

This could be a valid command:

java -jar AlignNewselaDataset.jar -i C:/data/newsela/ -o C:/data/output/ -l "en" -s "C3G" -a "sentence" -t "closestSimStrategy"

nalfear commented 5 years ago

It works now thank you very much.