Instructor streaming structured output with llama-cpp

ahuang11 commented 9 months ago

import llama_cpp
import instructor
import panel as pn
from pydantic import BaseModel
from huggingface_hub import hf_hub_download
pn.extension()

class Translations(BaseModel):
    chinese: str
    french: str
    spanish: str

model_path = hf_hub_download(
    "TheBloke/OpenHermes-2.5-Mistral-7B-GGUF",
    "openhermes-2.5-mistral-7b.Q4_K_M.gguf",
)
llama = llama_cpp.Llama(
    model_path=model_path,
    n_gpu_layers=-1,
    chat_format="chatml",
    n_ctx=2048,
    draft_model=llama_cpp.llama_speculative.LlamaPromptLookupDecoding(
        num_pred_tokens=2
    ),  # (1)!
    logits_all=True,
    verbose=False,
)
create = instructor.patch(
    create=llama.create_chat_completion_openai_v1,
    mode=instructor.Mode.JSON_SCHEMA,  # (2)!
)
message = {"role": "user", "content": "Teach me how to say `Hello` in three languages!"}
extraction_stream = create(
    response_model=instructor.Partial[Translations],  # (3)!
    messages=[message],
    stream=True,
)
json_pane = pn.pane.JSON()
display(json_pane)
for extraction in extraction_stream:
    json_pane.object = extraction.model_dump()

ahuang11 commented 9 months ago

Equivalent in funcchain

from pydantic import BaseModel
from huggingface_hub import hf_hub_download
from funcchain.model.patches.llamacpp import ChatLlamaCpp
from funcchain import settings, chain

pn.extension()

class Translations(BaseModel):
    chinese: str
    french: str
    spanish: str

def create_translations(text: str) -> Translations:
    """
    Translate the given text into three languages.
    """
    return chain()

model_path = hf_hub_download(
    "TheBloke/OpenHermes-2.5-Mistral-7B-GGUF",
    "openhermes-2.5-mistral-7b.Q4_K_M.gguf",
)
llama = ChatLlamaCpp(
    model_path=model_path,
    n_gpu_layers=-1,
    model_kwargs=dict(chat_format="chatml"),
    n_ctx=2048,
    verbose=False,
)
json_pane = pn.pane.JSON()
create_translations("Teach me how to say `Hello` in three languages!")

MarcSkovMadsen commented 9 months ago

Would we be able to deploy the local model to Hugging Face?

It would be nice to have live versions of the apps. And not just source code.

ahuang11 commented 9 months ago

Potentially? not sure whether huggingface cpus/memory are strong enough to load local llama models. I think when panel 1.4.0 is released, I want to rework a lot of these examples to reflect best practices.

holoviz-topics / panel-chat-examples

Instructor streaming structured output with llama-cpp #127