llama: update llama.cpp to latest version

danbev commented 9 months ago

This commit updates llama.cpp to the latest/later version.

The motivation for this is that the current version of llama.cpp is a little outdated and there have been changes to the llama.cpp API and also the model format. Currently it is not possible to use the new GGUF format and many of the available models are in this new format which can make it challenging to use this crate at the moment.

The following changes have been made:

update llama.cpp to latest version using git submodule update --remote --merge llama.cpp
Manually copied the generated bindings.rs file from the target directory to the src directory. Hope this was the correct thing to do.
Updated the llm-chain-llama crate to use llama_decode instead of llm_eval which has now been deprecated.
~A number of TODOs have been added to the code to highlight areas that I know I need to look at.~

This is a work in progress but I wanted to open a draft pull request sooner rather than later to get some visibility and feedback.

Currently I've been able to successfully run the simple example, few_shot, and stream examples. ~The map_reduce_llama example is not working as of this writing which I'll look into further~.

williamhogman commented 9 months ago

<3

Juzov commented 9 months ago

There's a clause ignoring MaxTokens if MaxTokens == 0 (or rather the reverse). So adding MaxTokens to be equal to MaxContextSize in the examples is redundant. If you want, and its possible, you can try to alter the option to be as big as the context window by default and remove the clause.

danbev commented 9 months ago

So adding MaxTokens to be equal to MaxContextSize in the examples is redundant.

I added a MaxBatchSize option in https://github.com/sobelio/llm-chain/pull/244/commits/452ac2c6d282a95c0c1a038432ec52490c98fd1a, and it has default value of 512 which matches the value in llama.cpp, and have now removed the options from the examples (apart from simple_llama).

If you want, and its possible, you can try to alter the option to be as big as the context window by default and remove the clause.

I'm planning on taking a closer look at the model options today, and I'll also take another look at the context options and your suggestion. Thanks

sobelio / llm-chain

llama: update llama.cpp to latest version #244