Updated llama.cpp (twice), only things that needed to be changed were adding some fields to ContextParams.
llama_init_from_file returns a nullptr if it fails which includes if the file is not found, but the rust code just silently ignores it and then only segfaults when it starts to run the model. Now it checks for nullptr and returns a Result.
When I used llama.cpp it would sometimes freeze the whole thread when the LLM made a lot of text. This was because the mpsc was set to have a limit of 100 segments but when the LLM is accelerated a lot like by CUBLAS it produces tokens faster than they are processed and fills up the buffer, which then gets deadlocked or something on .send() method (inside the library, not the user code). I replaced the mpsc::channel(100) with mpsc::unbounded_channel() so it can handle unlimited but if that is a performance concern maybe increase 100 to 1000 or something.
So far it works fine for running 33B Vicuna at least on my end.
Updated llama.cpp (twice), only things that needed to be changed were adding some fields to
ContextParams
.llama_init_from_file
returns a nullptr if it fails which includes if the file is not found, but the rust code just silently ignores it and then only segfaults when it starts to run the model. Now it checks for nullptr and returns aResult
.When I used
llama.cpp
it would sometimes freeze the whole thread when the LLM made a lot of text. This was because thempsc
was set to have a limit of 100 segments but when the LLM is accelerated a lot like by CUBLAS it produces tokens faster than they are processed and fills up the buffer, which then gets deadlocked or something on.send()
method (inside the library, not the user code). I replaced thempsc::channel(100)
withmpsc::unbounded_channel()
so it can handle unlimited but if that is a performance concern maybe increase 100 to 1000 or something.So far it works fine for running 33B Vicuna at least on my end.