Open Taikono-Himazin opened 5 months ago
To change the settings of llama-cpp, you will need to modify the text-to-text.yaml
file under the aio/gpu-8g/
directory. Here's how you can do it:
text-to-text.yaml
file in a text editor.model
section, which contains the various parameters of the llama-cpp model.--ctx-size
flag, look for the context_size
parameter. Note that in the YAML file, this parameter might be present under model.llama.ctx_size
or model.llama_cpp.ctx_size
.If you want to change other settings as well, look for the relevant parameters in the model
section and modify them accordingly. Make sure you understand the effects of each parameter before modifying it.
Regarding the context_size
parameter, you can check the llama-cpp documentation or the GitHub issue you mentioned (https://github.com/ggerganov/llama.cpp/issues/6603) to see if context_size
in the YAML file corresponds to the --ctx-size
flag. If there's any difference, follow the instructions provided in the issue for setting the appropriate parameter.
Remember to test your model after making any changes to ensure that everything is working as expected.
@localai-bot Where did the information for model.llama_cpp.ctx_size come from? Is it mentioned in the documentation?
The model.llama_cpp.ctx_size
parameter likely came from the documentation or configuration examples provided by the llama-cpp developers. However, it might also be derived from best practices shared by users within the community, based on their own experimentation. To be sure of its origin or to find the most up-to-date information, I suggest checking the official llama-cpp documentation or the GitHub repository (https://github.com/ggerganov/llama.cpp). It's also a good idea to look into any specific configuration examples provided by LocalAI for the text-to-text.yaml
file you are using, as they may have documented this parameter's source.
If you can't find the information you need, feel free to ask directly in the llama-cpp community or on the LocalAI forum for further assistance.
context_size in text-to-text.yaml doesn't seem to be working.
Both look the same to me.
I don't feel like the text-to-text.yaml settings are reflected correctly. The version is v2.12.4.
very sorry. The configuration file I was changing was located elsewhere. When I changed something in the correct location, the change was reflected correctly.
However, I would still like to know how to change settings other than context_size. It would be nice if you could specify the arguments to pass when starting the llama-cpp backend.
I would like to pass parameters such as : -ts 4,4 -sm row -ns 2 -ctk q8_0 -ctv q8_0 -fa to llama.cpp. Is this still possible?
I'm using docker's latest-aio-gpu-nvidia-cuda-12. I'm using multiple GPUs.
I would like to change the settings of llama-cpp in detail, but which file should I change?
I am modifying aio/gpu-8g/text-to-text.yaml, but the configuration items described here https://localai.io/advanced/#full-config-model-file-reference are sufficient. Is not ...
Specifically, I'm running into the issue https://github.com/ggerganov/llama.cpp/issues/6603. I want to set --ctx-size in this solution, but I don't know if context_size in yaml has the same meaning. There are also other settings that you may want to change.