Is HuggingFaceH4/starchat-beta supported?

laurids-reichardt commented 1 year ago

Running TheBloke/starchat-beta-GGML with llm-cli does not seem to produce any useful output:

❯ target/release/llm gpt2 infer \
    --model-path models/TheBloke/starchat-beta-GGML/starchat-beta.ggmlv3.q4_0.bin \
    --prompt-file starchat_prompt_template.txt \
    --prompt "How do I sort a list in Python?"
✓ Loaded 485 tensors (10.7 GB) after 1838ms
<fim_prefix><|system|> Below is a conversation between a human user and a helpful AI coding assistant. <|end|>
<|user|> How do I sort a list in Python? <|end|>
<|assistant|>О<|end|>,ร en -
 the: for's
 " y of on it) los.
 from which to
<issue_start><issue_comment>username_
<|system|><fim_prefix><fim_suffix>a = ["red",
      },
    {
        {"source": "@context" : ""}
  ]
]

204. A  # The following is a berthesefettyeam: "https://www.wikidata.org/entity/Q1357698",
       <|assistant|>
<|system|>
<|user|>
<|end|>
<|end|>
<|user|>
<|end|>^C

Contents of starchat_prompt_template.txt:

<|system|> Below is a conversation between a human user and a helpful AI coding assistant. <|end|>
<|user|> {{PROMPT}} <|end|>
<|assistant|>

Commit: 4b6366c3641217c96e7304edd5478b9e6743997b OS: macOS Ventura Processor: M1 Max 32GiB

LLukas22 commented 1 year ago

No it isn't yet. We would need to port the bigcode example over from the ggml Repo. But currently we are working on getting gpu support for all models, which will cause some structural changes. We should probably port the model after we implemented gpu support.

jondot commented 1 year ago

Anyone knows if this will work yet? the huggingface model page says llm is supported, but I'm not sure after reading this thread. My other option is to use candle but it requires a lot of boilerplate code at the moment.

LLukas22 commented 1 year ago

@jondot Theoretically you should be able to run it with the gpt2 architecture, but i haven't tested that yet. If you want give it a try and let me know if it works. And make sure you are using the external tokenizer, as the integrated one will probably produce gibberish.

jondot commented 1 year ago

I did try it, I have 2 issues:

performance (7+ minutes on M1 Pro vs 4s with ggml see this)
gibberish -- not binary symbols or stange stuff, just words that dont make sense.. probably due to tokenizing you're mentioning

rustformers / llm

Is HuggingFaceH4/starchat-beta supported? #304