vnadgir / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

clearnlp 3 takes so much memory for parsing (is this behaviour expected) #608

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.Run clearnlp parser demo with memory -Xmx3284m
Sample Demo:
String sentence = "Steve Kovach / Business Insider Nokia is releasing An 
Android Phone On The Cusp Of Its $7 Billion Ac - Heres a real noodle-scratcher 
for you.";
AbstractComponent[] components = {
                posTagger,
                lemmaTagger,
                parser};
DEPTree tree = new DEPTree(sentence);
for (AbstractComponent component : components){
   component.process(tree);
}
System.out.println(tree.toString());

2. Works perfectly with -Xmx3384m
3.

What is the expected output? What do you see instead?
Works with -Xmx3384m
With memory < 3284m
Throws 
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

What version of the product are you using? On what operating system?
clearnlp version 3.0.2
java 1.8, jre 8, IDE eclipse luna, Windows 8

Please provide any additional information below.

Original issue reported on code.google.com by amita...@gmail.com on 15 Apr 2015 at 11:15

GoogleCodeExporter commented 9 years ago
First let me say that we are not the developers of ClearNLP. We just integrate 
it with UIMA in DKPro Core - and in DKPro Core we are still on version 2 of 
ClearNLP

The real ClearNLP homepage is here: https://github.com/clir/clearnlp
The mailing list is here: 
https://groups.google.com/forum/?fromgroups#!forum/clearnlp

You should re-post your question to the mailing list - I'm sure Jinho will 
eventually answer.

Afaik ClearNLP 2 was using quite a bit of memory, so the effect you see is not 
unexpected. However, the release announcement [1] for ClearNLP 3 states that: 

> The version 3.0.0 is written from the scratch. All components in this version 
show significant speed-up over the previous ones (2-3 times), and the 
statistical models consume less disk and memory space.

Closing this issue as invalid here. Deferred to upstream developers.

[1] https://groups.google.com/forum/?fromgroups#!topic/clearnlp/xR17k7iZkcY

Original comment by richard.eckart on 15 Apr 2015 at 11:43