Closed jrtp closed 3 months ago
Hey @jrtp thanks for the issue, this was indeed not possible. However, it can now be done like this:
LlamaIterator iterator = model.generate(params).iterator();
while (iterator.hasNext()) {
LLamaOutput output = iterator.next();
System.out.println(output);
if (Math.random() < 0.5) {
iterator.cancel();
}
}
Note, that there was a slight API change from LlamaModel.Output
to LlamaOutput
.
Maven version 3.1.0
should soon be available.
Awesome, thx for the quick inplementation. I just tried it, after the update the model loading interrupted with the previous llama.cpp I had compiled with GPU on windows, then after recompiling it seemed to work but for some reason inference suddenly stops in the middle. Any ideas what it could be?
LlamaIterator iterator = model.generate(inferParams).iterator();
while (iterator.hasNext()) {
token = String.valueOf(iterator.next());
Main.logLine(token);
if (canceled) {
iterator.cancel();
break;
}
}
Same behaviour that inference just stops with the shipped llama.cpp (so no --Dde.kherud.llama.lib.path set) - just a lot slower ;)
It's expected that the previous shared library doesn't work anymore, since I upgraded the binding to the latest available llama.cpp version in 3.1.0
.
From the code you gave, it's hard to tell why it suddenly stops. If it's not done on purpose via canceled
, maybe your inferParams
are the reason.
If you can give more details, I can later try to reproduce the problem:
inferParams
Weired, just switched back and forth between dependencies and now it just flies without obvious change - thx again!
Ok, hopefully last question, now everything seems to work except exactly the same run.bat as previously throws this error now: Could not find or load main class .kherud.llama.lib.path=out The cmd option used is this -Dde.kherud.llama.lib.path=out/ Without the option it works but without GPU, weiredly enough it also works with GPU started from intellij with that option - any ideas how that could be affected? I triplechecked, with 3.0.2 this option was just working.
FYI - user error - for some reason Intellij had 3.0.2 and 3.1.0 in the artifacts, didnt know this isnt updating automatically when changing maven deps
Great, glad to hear everything works now!
Nice work :) I did not find a way to cleanly interrupt inference, could only supress output from the iterable loop. Is this somehow possible?