Open oppenheimer- opened 1 month ago
would it be possible to add a base path for ollama, please?
maybe similar to smart second brain or other plugins that use ollama.
the route "/api/tags" delivers all the models to populate the list.
Originally posted by @oppenheimer- in https://github.com/jcollingj/caret/issues/5#issuecomment-2408209674
The full API docs are here: Ollama API docs
1) the GET /api/tags endpoint provides basic information. e.g.
{ "models": [ { "name": "qwen2.5-coder:7b-instruct", "model": "qwen2.5-coder:7b-instruct", "modified_at": "2024-10-08T08:59:00+02:00", "size": 4683087590, "digest": "87098ba7390d43e0f8d615776bc7c4372c9e568c436bc1933f93832f9cf09b84", "details": { "parent_model": "", "format": "gguf", "family": "qwen2", "families": [ "qwen2" ], "parameter_size": "7.6B", "quantization_level": "Q4_K_M" } } ] }
2) a subsequent POST request to curl http://localhost:11434/api/show -d '{ "name": "llama3.2" }' will reveal the required information
{ "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llava:latest\n\nFROM /Users/matt/.ollama/models/blobs/sha256:200765e1283640ffbd013184bf496e261032fa75b99498a9613be4e94d63ad52\nTEMPLATE \"\"\"{{ .System }}\nUSER: {{ .Prompt }}\nASSISTANT: \"\"\"\nPARAMETER num_ctx 4096\nPARAMETER stop \"\u003c/s\u003e\"\nPARAMETER stop \"USER:\"\nPARAMETER stop \"ASSISTANT:\"", "parameters": "num_keep 24\nstop \"<|start_header_id|>\"\nstop \"<|end_header_id|>\"\nstop \"<|eot_id|>\"", "template": "{{ if .System }}<|start_header_id|>system<|end_header_id|>\n\n{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>\n\n{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>\n\n{{ .Response }}<|eot_id|>", "details": { "parent_model": "", "format": "gguf", "family": "llama", "families": [ "llama" ], "parameter_size": "8.0B", "quantization_level": "Q4_0" }, "model_info": { "general.architecture": "llama", "general.file_type": 2, "general.parameter_count": 8030261248, "general.quantization_version": 2, "llama.attention.head_count": 32, "llama.attention.head_count_kv": 8, "llama.attention.layer_norm_rms_epsilon": 0.00001, "llama.block_count": 32, "llama.context_length": 8192, // this is what you were looking for "llama.embedding_length": 4096, "llama.feed_forward_length": 14336, "llama.rope.dimension_count": 128, "llama.rope.freq_base": 500000, "llama.vocab_size": 128256, "tokenizer.ggml.bos_token_id": 128000, "tokenizer.ggml.eos_token_id": 128009, "tokenizer.ggml.merges": [], // populates if `verbose=true` "tokenizer.ggml.model": "gpt2", "tokenizer.ggml.pre": "llama-bpe", "tokenizer.ggml.token_type": [], // populates if `verbose=true` "tokenizer.ggml.tokens": [] // populates if `verbose=true` } }
if im not mistaken, the context_length can be acquired with the model family name.
maybe similar to smart second brain or other plugins that use ollama.
the route "/api/tags" delivers all the models to populate the list.
Originally posted by @oppenheimer- in https://github.com/jcollingj/caret/issues/5#issuecomment-2408209674
The full API docs are here: Ollama API docs
1) the GET /api/tags endpoint provides basic information. e.g.
2) a subsequent POST request to curl http://localhost:11434/api/show -d '{ "name": "llama3.2" }' will reveal the required information
if im not mistaken, the context_length can be acquired with the model family name.