neosyon / SimpTextAlign

Repo for the simplified text alignment tools.
MIT License
21 stars 7 forks source link

java.lang.NullPointerException #2

Closed chaojiang06 closed 5 years ago

chaojiang06 commented 6 years ago

Hi, thanks for your useful tool!

However, when I run the tool, I meet a bug which error log is:

Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.NullPointerException at simplifiedTextAlignment.DatasetAlignment.AlignNewselaDataset.main(AlignNewselaDataset.java:94) ... 5 more

My java version is 8, would you please take a look at this issue?

Thanks in advance!

neosyon commented 6 years ago

Thank you for using our tool. Can you please share more details? I don't think it's a Java 8 issue. I would like to see the full execution command line. Did you include the -i "decompressed newsela folder path" argument?

neosyon commented 6 years ago

Thank you for your comments and analysis. I found a bug in the code related to how to deal with English abbreviations when they were correctly processed by the sentence splitter and it is the last sentence of the text. I just uploaded the fix.

I hope it works!

Best,

Marc

From: Chao6 notifications@github.com Sent: viernes, 28 de septiembre de 2018 23:37 To: neosyon/SimpTextAlign SimpTextAlign@noreply.github.com Cc: Marc F. S. neosyon@gmail.com; Comment comment@noreply.github.com Subject: Re: [neosyon/SimpTextAlign] java.lang.NullPointerException (#2)

Hi, thanks for reply!

Would you please try to use the latest version of ComputeSimilarityBetweenTexts class to compute the similarity of this line regardless of which similarity measure?

A1_Grand_Prix Nelson Piquet , Jr. won the inaugural race of the series for A1 Team Brazil . The Curitiba , Brazil in January 2006 was canceled . 0

This line follows the same format as Hwang's Standard Wikipedia to Simple Wikipedia alignments dataset, each element is separated by '\t' like this:

A1_Grand_Prix\tNelson Piquet , Jr. won the inaugural race of the series for A1 Team Brazil .\tThe Curitiba , Brazil in January 2006 was canceled .\t0

The code will report a NullPointerException. The problem stems from the sentence splitter in getSubtexts function in TextProcessingUtils. I understand that by adjusting the pointer, we could avoid wrong sentences splitting like (Dr./Prof.) suggest by this article https://stackoverflow.com/questions/17159513/split-paragraph-into-sentences-with-titles-and-numbers?answertab=oldest#tab-top .

But in this case, if the only sentence contains abbreviation, this only sentence will be filtered out. An empty list will be returned by the getSubtexts and getCleanText functions in TextProcessingUtils class. In getAlignmentsUsingClosestCosSim function in VectorUtils class, closestIndex = -1, cleanSubtexts1.get(closestIndex).getText() and sims[closestIndex][i] will induce a NullPointerException.

Since when calculating sentences pair similarity, the input is splitted sentence. Thus to ensure the running, can I just replace if (! hasAbbreviation(sentence)) with if (true) in the getSubtexts function?

I think this operation will not affect the correctness of the similarity score, is this correct?

Thanks!

BTW, the second to the last main version (commit: https://github.com/neosyon/SimpTextAlign/commit/38ab50d5c23f1a1383eba0d71c3f2818799595b6 38ab50d) which uses Stanford nlp DocumentPreprocessor doesn't have this problem.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/neosyon/SimpTextAlign/issues/2#issuecomment-425573200 , or mute the thread https://github.com/notifications/unsubscribe-auth/AS63i8Cd7KleeeeH8-2e40ghHX5mTMvUks5ufpaCgaJpZM4V9JdV .

sabdul111 commented 5 years ago

Hi,

Thanks for your useful tool, however I get the same error.

Calculating IDF... Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58) Caused by: java.lang.IllegalStateException: basedir ../../data/Newsela/articles is not a directory at org.apache.tools.ant.DirectoryScanner.scan(DirectoryScanner.java:797) at simplifiedTextAlignment.Representations.NgramModel.buildNewselaNgramModel(NgramModel.java:42) at simplifiedTextAlignment.DatasetAlignment.AlignNewselaDataset.main(AlignNewselaDataset.java:107) ... 5 more

My java version is openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

Can you please guide how to resolve this,

Thank you in adavance.