The bug
When using models.LlamaCpp the selected tokenizer is always gpt2 (This can be seen in the outut when verbose=True arg is set). I have pasted the dumped KV metadat keys
Is there something else that is required to properly set the tokenizer? Note, that I am using locally downloaded LLama 3.1 8B GGUF weights
To Reproduce
Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.
from guidance import models
llama3 = models.LlamaCpp(
model_path,
n_gpu_layers=NUM_LAYERS_13B,
n_batch=512,
n_ctx=N_CONTEXT,
echo=False,
temperature=0.5,
verbose=True, # set to True to see if GPU off loading is happening properly
llama_cpp_kwargs={
tokenizer: tokenizer,
}
)
llama3 + f'Do you want a joke or a poem? ' + gen(stop='.')
System info (please complete the following information):
The bug When using
models.LlamaCpp
the selected tokenizer is always gpt2 (This can be seen in the outut whenverbose=True
arg is set). I have pasted the dumped KV metadat keysIs there something else that is required to properly set the tokenizer? Note, that I am using locally downloaded LLama 3.1 8B GGUF weights
To Reproduce Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.
System info (please complete the following information):
guidance.__version__
): 0.1.15