ufal / nametag

NameTag: Named Entity Tagger
Mozilla Public License 2.0
38 stars 10 forks source link

Memory Leak in Java Binding #4

Closed dedekj closed 7 years ago

dedekj commented 7 years ago

When I run following code in java (with -Xmx100m option to limit the heap memory), the executed process quite quickly consumes more and more RAM memory: starting at about 300MB going to 2GB in less than a minute and continues growing...

I'm quite sure that the java code is ok, so the memory leak has to be in the native C++ code. Note also that the leaked memory does not belong to the java heap because of the -Xmx100m option.

Tested no windows but similar behavior observed on Centos Linux as well.

NameTag version 1.1.1

I don't have C++ toolkit ready, so a didn't test it directly (without java), but I can add more details if needed.

import cz.cuni.mff.ufal.nametag.*;

public class RunNer {
    public static void main(String[] args) {
        Ner ner = Ner.load("target/models/czech-cnec2.0-140304.ner");

        Forms forms = new Forms();
        TokenRanges tokens = new TokenRanges();
        NamedEntities entities = new NamedEntities();
        Tokenizer tokenizer = ner.newTokenizer();

        for (int r = 0; r < 10000000; r++) {
            String text = "Václav Havel byl prezident České Republiky";
            tokenizer.setText(text);
            while (tokenizer.nextSentence(forms, tokens)) {
                ner.recognize(forms, entities);
            }
            if (r % 10000 == 0)
                System.err.println(r);
        }
    }
}
foxik commented 7 years ago

I can replicate the problem on current master with the provided Java example. However, corresponding C++ code does not show any leak in valgrind (also I have been running REST server for more than a year without interruption, without any sign of memory leak). So I suspect the problem might be somewhere in the Java<->C++ bindings -- either I have some error in the the swig interface file, or there is some problem in the Java part of swig itself. I will investigate more tomorrow or on Wednesday.

foxik commented 7 years ago

In the end the problem was in C++, but it was already fixed in April by c820aa7 -- which is why I could detect no leak in current master :-)

I released version 1.1.2 including the fix.