How to load a model with API

LaptopDev commented 11 months ago

Is it possible to load phind-codellama-34b-v2.Q4_K_M.gguf model in API mode using flags and config.json and config-user.yaml to send POST/GET requests to the API in chat-instruct mode? Tell me if I am wrong, but without running and loading a model in the gui, I do not have two files in my models/ directory: config-user.yaml or config.json, which I believe are created when setting/saving configurations in the UI. So, the config-user.yaml file stores model configurations from the model tab during GUI interaction (for example it has all my configurations for my Phind model in it). I suspect then, the config.json file (mine just has one line saying 'llama') stores the transformer that loads the model in config-user.yaml. (By the way, how would these files look if I tried to save multiple different models and their configurations?)

Do I use these 2 files with flags with the --api flag to load the model as an api? After I load my model as an api, how can I configure it to accept restful api requests and return responses in chat-instruct mode?

Forgive me for any misunderstandings. I appreciate any support I can get.

HiroseKoichi commented 11 months ago

It's very hard for me to understand you, so forgive me if I'm wrong, but I'm going to assume you're asking how to load models when using --nowebui as well as how to interact with it when using it.

For loading a model, you can do so by using llama.cpp flags as well as rope scaling flags. For example, the command you use might look something like this: python server.py --nowebui --api --model phind-codellama-34b-v2.Q4_K_M.gguf --n_ctx 4096 --n-gpu-layers 30 --rope_freq_base 1000000 But obviously, you would replace the values with the correct ones for your use.

As for how to utilize the api, you would need to use a front-end that is OpenAI API compatible, and SillyTavern is a very popular one that gets updated frequently.

LaptopDev commented 11 months ago

Can I just send requests to the api instead of using a front end?

HiroseKoichi commented 11 months ago

I don't know how to manually send API requests, but here's a link to the API's wiki page. If you're creating something that calls the API and does something with the output, then I can't really help much because I'm not a programmer. If you're just wanting to use it from the terminal, then using plain llama.cpp would be best.

github-actions[bot] commented 10 months ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

tanyuantj commented 8 months ago

I guess HiroseKoichi is mentioning that how to load/switch the chat models with api instead of from UI or code.

oobabooga / text-generation-webui

How to load a model with API #4731