Open rlleshi opened 6 months ago
Updates
It was pointed out to me in the slack that it'd be good to provide the config file as well:
model_type: llm
# base_model: meta-llama/Llama-2-7b-hf
base_model: meta-llama/Llama-2-13b-hf
model_parameters:
trust_remote_code: true
backend:
type: local
cache_dir: ./ludwig_cache
input_features:
- name: input
type: text
preprocessing:
max_sequence_length: 326
output_features:
- name: output
type: text
preprocessing:
max_sequence_length: 64
prompt:
template: >-
### User: {input}
### Assistant:
generation:
temperature: 0.1
max_new_tokens: 32
repetition_penalty: 1.0
# remove_invalid_values: true
adapter:
type: lora
dropout: 0.05
r: 8
quantization:
bits: 4
preprocessing:
global_max_sequence_length: 326
split:
type: fixed
trainer:
type: finetune
epochs: 9
batch_size: 1
eval_batch_size: 2
gradient_accumulation_steps: 16
learning_rate: 0.0004
learning_rate_scheduler:
warmup_fraction: 0.03
The serving command as shown in the docs: ludwig serve --model_path ./results/experiment_run/model
By default, this serves the model on my GPU device. My question simply is how to serve this exclusively on CPU.
@rlleshi What output/error messages are you getting? Thank you.
@alexsherstinsky thanks for getting back
I'm not getting any errors. I just want to know how to run it on a CPU device. So it's just a question. I didn't find any relevant documents from Ludwig pertaining this.
From the docs, Ludwig spawns a REST API for inference. By default, this happens on a GPU.
However, is there any option to do this using CPU only for inference?