Interrupting inference?

kherud / java-llama.cpp

Java Bindings for llama.cpp - A Port of Facebook's LLaMA model in C/C++

MIT License

279 stars 28 forks source link

Interrupting inference? #62

Closed jrtp closed 3 months ago

jrtp commented 4 months ago

Nice work :) I did not find a way to cleanly interrupt inference, could only supress output from the iterable loop. Is this somehow possible?

kherud commented 4 months ago

Hey @jrtp thanks for the issue, this was indeed not possible. However, it can now be done like this:

LlamaIterator iterator = model.generate(params).iterator();
while (iterator.hasNext()) {
    LLamaOutput output = iterator.next();
    System.out.println(output);

    if (Math.random() < 0.5) {
        iterator.cancel();
    }
}

Note, that there was a slight API change from LlamaModel.Output to LlamaOutput.

Maven version 3.1.0 should soon be available.

jrtp commented 3 months ago

Awesome, thx for the quick inplementation. I just tried it, after the update the model loading interrupted with the previous llama.cpp I had compiled with GPU on windows, then after recompiling it seemed to work but for some reason inference suddenly stops in the middle. Any ideas what it could be?

        LlamaIterator iterator = model.generate(inferParams).iterator();
        while (iterator.hasNext()) {
            token = String.valueOf(iterator.next());
            Main.logLine(token);
            if (canceled) {
                iterator.cancel();
                break;
            }
        }

Same behaviour that inference just stops with the shipped llama.cpp (so no --Dde.kherud.llama.lib.path set) - just a lot slower ;)

kherud commented 3 months ago

It's expected that the previous shared library doesn't work anymore, since I upgraded the binding to the latest available llama.cpp version in 3.1.0.

From the code you gave, it's hard to tell why it suddenly stops. If it's not done on purpose via canceled, maybe your inferParams are the reason.

If you can give more details, I can later try to reproduce the problem:

Which model
Which inferParams
Ideally also the prompt

jrtp commented 3 months ago

Weired, just switched back and forth between dependencies and now it just flies without obvious change - thx again!

jrtp commented 3 months ago

Ok, hopefully last question, now everything seems to work except exactly the same run.bat as previously throws this error now: Could not find or load main class .kherud.llama.lib.path=out The cmd option used is this -Dde.kherud.llama.lib.path=out/ Without the option it works but without GPU, weiredly enough it also works with GPU started from intellij with that option - any ideas how that could be affected? I triplechecked, with 3.0.2 this option was just working.

jrtp commented 3 months ago

FYI - user error - for some reason Intellij had 3.0.2 and 3.1.0 in the artifacts, didnt know this isnt updating automatically when changing maven deps

kherud commented 3 months ago

Great, glad to hear everything works now!