Open txhno opened 2 months ago
@txhno Sorry about that random weird comment...removed your reply too since it had a quote of the link in it, hope that's OK!
On topic -- exploring Ollama support is a really good idea. My understanding is that they just use llama.cpp
under the hood and manage GGUF files, right? If we can figure out where the GGUFs are hosted on local file systems, then we can use our llama.cpp
infrastructure to make it easy to load ollama models in guidance.
We can put this on our backlog to investigate, but if you (or anyone reading this!) have some knowledge about how Ollama works, I'd be happy to tag-team and support a PR here.
@riedgar-ms @nking-1 for awareness
Hi! I’ve implemented a thin wrapper for Ollama support in my fork. Can you give it a shot before I submit a PR? Thanks!
When will Ollama support be available? I'm trying to do it with forked Ollama, but I'm getting {G|Number|G} if it's not being handled correctly. The Ollama initialization seems to be working fine, but the guidance doesn't seem to be producing the right result.
When will Ollama support be available? I'm trying to do it with forked Ollama, but I'm getting {G|Number|G} if it's not being handled correctly. The Ollama initialization seems to be working fine, but the guidance doesn't seem to be producing the right result.
This is likely because the model's chat template did not load. see the comment here
From what I know, for a model to work with guidance, it needs to provide guidance role start and role end tags, e.g., <|user|>\n
and <|assistant|>\n
for Phi3 Small and Medium. see here.
Currently, guidance uses the templates of the models as keys to find the constructed chat template classes; otherwise, it uses the predefined chat template class, ChatMLTemplate
, if the chat template class for the model in use is not implemented. However, using the default tags of ChatMLTemplate
may cause guidance to be not constrained and generate unexpected outputs.
Ollama uses llamacpp as its backend, and the models that Ollama serves contain a template and modelfile, see the output of Ollama api /api/show -d '{"name": "phi3"}'
:
{
"license": "Microsoft.\nCopyright (c) Microsoft Corporation.\n\nMIT License\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.",
"modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this, replace FROM with:\n# FROM phi3:latest\n\nFROM D:\\ollama_models\\blobs\\sha256-633fc5be925f9a484b61d6f9b9a78021eeb462100bd557309f01ba84cac26adf\nTEMPLATE \"{{ if .System }}<|system|>\n{{ .System }}<|end|>\n{{ end }}{{ if .Prompt }}<|user|>\n{{ .Prompt }}<|end|>\n{{ end }}<|assistant|>\n{{ .Response }}<|end|>\"\nPARAMETER stop <|end|>\nPARAMETER stop <|user|>\nPARAMETER stop <|assistant|>\nLICENSE \"\"\"Microsoft.\nCopyright (c) Microsoft Corporation.\n\nMIT License\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\"\"\"\n",
"parameters": "stop \"<|end|>\"\nstop \"<|user|>\"\nstop \"<|assistant|>\"",
"template": "{{ if .System }}<|system|>\n{{ .System }}<|end|>\n{{ end }}{{ if .Prompt }}<|user|>\n{{ .Prompt }}<|end|>\n{{ end }}<|assistant|>\n{{ .Response }}<|end|>",
"details": {
"parent_model": "",
"format": "gguf",
"family": "phi3",
"families": [
"phi3"
],
"parameter_size": "3.8B",
"quantization_level": "Q4_0"
},
"model_info": {
"general.architecture": "phi3",
"general.basename": "Phi-3",
"general.file_type": 2,
"general.finetune": "128k-instruct",
"general.languages": [
"en"
],
"general.license": "mit",
"general.license.link": "https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/resolve/main/LICENSE",
"general.parameter_count": 3821079648,
"general.quantization_version": 2,
"general.size_label": "mini",
"general.tags": [
"nlp",
"code",
"text-generation"
],
"general.type": "model",
"phi3.attention.head_count": 32,
"phi3.attention.head_count_kv": 32,
"phi3.attention.layer_norm_rms_epsilon": 0.00001,
"phi3.attention.sliding_window": 262144,
"phi3.block_count": 32,
"phi3.context_length": 131072,
"phi3.embedding_length": 3072,
"phi3.feed_forward_length": 8192,
"phi3.rope.dimension_count": 96,
"phi3.rope.freq_base": 10000,
"phi3.rope.scaling.attn_factor": 1.1902381,
"phi3.rope.scaling.original_context_length": 4096,
"tokenizer.ggml.add_bos_token": false,
"tokenizer.ggml.add_eos_token": false,
"tokenizer.ggml.bos_token_id": 1,
"tokenizer.ggml.eos_token_id": 32000,
"tokenizer.ggml.model": "llama",
"tokenizer.ggml.padding_token_id": 32000,
"tokenizer.ggml.pre": "default",
"tokenizer.ggml.scores": null,
"tokenizer.ggml.token_type": null,
"tokenizer.ggml.tokens": null,
"tokenizer.ggml.unknown_token_id": 0
},
"modified_at": "2024-11-06T12:04:14+08:00"
}
Supposedly, if the model Ollama serves contains a chat template and the corresponding chat template is implemented in guidance, guidance will work fine. But for all the models of Ollama to fully work, it needs a way for the forked Ollama models to locate their role tags. One approach is to implement chat templates for all Ollama models in guidance/chat.py
, but this is somewhat cumbersome and labour intensive.
I am not sure if there are any other ways to automatically retrieve the role tags based on the model information provided by Ollama. If I have misunderstood anything, please correct me.
Fan
I am not sure if there are any other ways to automatically retrieve the role tags based on the model information provided by Ollama. If I have misunderstood anything, please correct me.
You are spot on. There is #947 which attempts to extract a ChatTemplate
from HuggingFace transformer tokenizer.
Is your feature request related to a problem? Please describe. would want to reuse the models that I already have downloaded on ollama
Describe the solution you'd like being able to use models.ollama(model_name_or_path)
Describe alternatives you've considered llama cpp works as of now, but ollama would just make process of using this app a lot more user friendly having downloads automated and models stored centrally
Additional context none