inarahd / jwktl

Automatically exported from code.google.com/p/jwktl
0 stars 0 forks source link

NullPointerException when filtering by language #1

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. download 'dump dewiktionary-20130818-pages-articles.xml.bz2'
2. parse it with 'JWKTL.parseWiktionaryDump(dumpFile, outputDirectory, true);'
3. run the snippet you gave as an example on 
http://code.google.com/p/jwktl/wiki/JWKTLUseCases

  IWiktionaryEdition wkt = JWKTL.openEdition(WIKTIONARY_DIRECTORY);
  WiktionaryEntryFilter filter = new WiktionaryEntryFilter();
  filter.setAllowedWordLanguages(Language.GERMAN);
  filter.setAllowedPartsOfSpeech(PartOfSpeech.ADJECTIVE);
  int deAdjectiveCount = 0;
  for (IWiktionaryEntry entry : wkt.getAllEntries(filter)) {
    System.out.println(entry.getWord());
    deAdjectiveCount++;
  }
  System.out.println("German adjectives: " + deAdjectiveCount);
  wkt.close();

What is the expected output? What do you see instead?

What I expect is that all german adjectives are printed and their count also.
But what I see is a NullPointerException. Here is the stack trace:

Exception in thread "main" java.lang.NullPointerException
    at java.util.TreeMap.getEntry(TreeMap.java:342)
    at java.util.TreeMap.containsKey(TreeMap.java:227)
    at java.util.TreeSet.contains(TreeSet.java:234)
    at de.tudarmstadt.ukp.jwktl.api.filter.WiktionaryEntryFilter.acceptWordLanguage(WiktionaryEntryFilter.java:103)
    at de.tudarmstadt.ukp.jwktl.api.filter.WiktionaryEntryFilter.accept(WiktionaryEntryFilter.java:149)
    at de.tudarmstadt.ukp.jwktl.api.entry.WiktionaryEdition$1.fetchNext(WiktionaryEdition.java:97)
    at de.tudarmstadt.ukp.jwktl.api.entry.WiktionaryEdition$1.fetchNext(WiktionaryEdition.java:81)
    at de.tudarmstadt.ukp.jwktl.api.util.WiktionaryIterator.hasNext(WiktionaryIterator.java:43)
    at WiktionarySearcher.main(WiktionarySearcher.java:24)

What version of the product are you using? On what operating system?

I am using version version 1.0.0 on Ubuntu 

Best

Abou

Original issue reported on code.google.com by abdoulay...@neofonie.de on 21 Aug 2013 at 3:06

GoogleCodeExporter commented 9 years ago
That's an error, indeed. Here's a quick fix for version 1.0.0:

{{{
IWiktionaryEdition wkt = JWKTL.openEdition(WIKTIONARY_DIRECTORY);
WiktionaryEntryFilter filter = new WiktionaryEntryFilter(){
  @Override
  protected boolean acceptWordLanguage(final IWiktionaryEntry entry) {
    if (entry.getWordLanguage() == null)
      return false;

    return super.acceptWordLanguage(entry);
  }
  @Override
  protected boolean acceptPartOfSpeech(final IWiktionaryEntry entry) {
    if (entry.getPartOfSpeech() == null)
      return false;

    return super.acceptPartOfSpeech(entry);
  }
};
filter.setAllowedWordLanguages(Language.GERMAN);
filter.setAllowedPartsOfSpeech(PartOfSpeech.ADJECTIVE);
int deAdjectiveCount = 0;
for (IWiktionaryEntry entry : wkt.getAllEntries(filter)) {
  System.out.println(entry.getWord());
  deAdjectiveCount++;
}
System.out.println("German adjectives: " + deAdjectiveCount);
wkt.close();
}}}

Proper fix is available for the current head revision.

Original comment by chmeyer.de on 21 Aug 2013 at 3:20

GoogleCodeExporter commented 9 years ago
Possible hint is that some entries contain no information  about the word 
language,
that is to say, "entry.getWordLanguage().equals(null)" and this leads to an 
exception when trying to filter.

Original comment by abdoulay...@neofonie.de on 21 Aug 2013 at 3:21

GoogleCodeExporter commented 9 years ago
Thx for the quick reply.
Checking out the fix will also be great, but it seems that one dependency 
cannot be found, when I try to build the artifact using 'mvn install'. This 
concerns the following one:

xerces:xercesImpl:jar:2.9.1-lucene

Missing:
----------
1) xerces:xercesImpl:jar:2.9.1-lucene

  Try downloading the file manually from the project website.

Original comment by abdoulay...@neofonie.de on 21 Aug 2013 at 3:25

GoogleCodeExporter commented 9 years ago
Could you please open separate issues for different problems? Xerces should 
come directly via Maven or is available from 
http://xerces.apache.org/xerces2-j/.

Original comment by chmeyer.de on 21 Aug 2013 at 3:34

GoogleCodeExporter commented 9 years ago

Original comment by chmeyer.de on 21 Aug 2013 at 3:38

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Oh soory I missed that. I will open separate issues for different problems from 
now on.

Original comment by abdoulay...@neofonie.de on 21 Aug 2013 at 3:40