serve model other than downloaded via `llama download ...`

maswadkar commented 5 hours ago

Hi, I have successfully done the cycle of

llama download --source huggingface --model-id  Llama3.1-8B-Instruct --hf-token <HF_TOKEN>
llama stack build
llama stack configure testerx
llama stack run testerx

now my next assignment is to

- get model via tune download meta-llama/Meta-Llama-3.1-8B-Instruct
- LORA finetune it  using 
  tune run lora_finetune_single_device --config llama3_1/8B_lora_single_device` 
- and then serve it using
  llama stack build
  llama stack configure testerx
  llama stack run testerx

my question is how can i change the checkpoint path to my custom downloaded model

maswadkar commented 4 hours ago

I think thats the holy grail of opensource LLMs

One should be able to download -> finetune -> serve via REST Endpoint (api)

and it should be done in single tool chain.

ashwinb commented 3 hours ago

If you use our meta-reference inference provider you can now do that using this: https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/impls/meta_reference/inference/config.py#L34 although we don't quite support or advertise this super well (because you could technically put any random checkpoint in there with params.json not being correct or weights being wrong, etc.)

meta-llama / llama-stack

serve model other than downloaded via `llama download ...` #277