Closed flatsiedatsie closed 6 months ago
Seems like cpp code throws an exception, but it's unable to display the exception correctly. I'll have a look in the next days when I have more time.
Also have you tried with other parameters? For example lower context length or try with multi / single thread build
Also have you tried with other parameters? For example lower context length or try with multi / single thread build
I have literally tried both those thing :-)
Singular, spaced out inference works fine. It believe it has something to do with running tasks one after the other, in quick succession (summarizing a document in mutiple chunks). Shortening context also helped, but for summarization defeats the point a bit.
There are more things I can try on this end. I'm trying to space tasks out more.
oh it just crashed again, darn.
// The Brave tab had grown to 16Gb, on a 1Gb model.
// I think my code is restarting a bit too eagerly on a crash with that one.
I think the issue is in my code.
I've done some more testing and found the issue. As predicted, it was in my code.
I was setting the model's context (n_ctx
) size only, as that was the only variable of that nature that needed to be set in llama_cpp_wasm
. But with Wllama, which offers much more low level control, the n_seq_max
and n_batch
values also needed to be set explicitly. Setting all three to the same value (8192 in this case) solved the issue.
I'm catching a lot of errors like this:
While attempting to load: https://huggingface.co/bartowski/h2o-danube2-1.8b-chat-GGUF/resolve/main/h2o-danube2-1.8b-chat-Q5_0.gguf ..with a 8192 context.
I'm probably doing something wrong. From what I could find online, perhaps it has something to do with improper callbacks?
Full log: