Add a base path for ollama

          would it be possible to add a base path for ollama, please?

maybe similar to smart second brain or other plugins that use ollama.

the route "/api/tags" delivers all the models to populate the list.

Originally posted by @oppenheimer- in https://github.com/jcollingj/caret/issues/5#issuecomment-2408209674

The full API docs are here: Ollama API docs

1) the GET /api/tags endpoint provides basic information. e.g.

{
   "models": [
      {
         "name": "qwen2.5-coder:7b-instruct",
         "model": "qwen2.5-coder:7b-instruct",
         "modified_at": "2024-10-08T08:59:00+02:00",
         "size": 4683087590,
         "digest": "87098ba7390d43e0f8d615776bc7c4372c9e568c436bc1933f93832f9cf09b84",
         "details": {
            "parent_model": "",
            "format": "gguf",
            "family": "qwen2",
            "families": [
               "qwen2"
            ],
            "parameter_size": "7.6B",
            "quantization_level": "Q4_K_M"
         }
      }
   ]
}

2) a subsequent POST request to curl http://localhost:11434/api/show -d '{ "name": "llama3.2" }' will reveal the required information

{
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llava:latest\n\nFROM /Users/matt/.ollama/models/blobs/sha256:200765e1283640ffbd013184bf496e261032fa75b99498a9613be4e94d63ad52\nTEMPLATE \"\"\"{{ .System }}\nUSER: {{ .Prompt }}\nASSISTANT: \"\"\"\nPARAMETER num_ctx 4096\nPARAMETER stop \"\u003c/s\u003e\"\nPARAMETER stop \"USER:\"\nPARAMETER stop \"ASSISTANT:\"",
  "parameters": "num_keep                       24\nstop                           \"<|start_header_id|>\"\nstop                           \"<|end_header_id|>\"\nstop                           \"<|eot_id|>\"",
  "template": "{{ if .System }}<|start_header_id|>system<|end_header_id|>\n\n{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>\n\n{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>\n\n{{ .Response }}<|eot_id|>",
  "details": {
    "parent_model": "",
    "format": "gguf",
    "family": "llama",
    "families": [
      "llama"
    ],
    "parameter_size": "8.0B",
    "quantization_level": "Q4_0"
  },
  "model_info": {
    "general.architecture": "llama",
    "general.file_type": 2,
    "general.parameter_count": 8030261248,
    "general.quantization_version": 2,
    "llama.attention.head_count": 32,
    "llama.attention.head_count_kv": 8,
    "llama.attention.layer_norm_rms_epsilon": 0.00001,
    "llama.block_count": 32,
    "llama.context_length": 8192, // this is what you were looking for
    "llama.embedding_length": 4096,
    "llama.feed_forward_length": 14336,
    "llama.rope.dimension_count": 128,
    "llama.rope.freq_base": 500000,
    "llama.vocab_size": 128256,
    "tokenizer.ggml.bos_token_id": 128000,
    "tokenizer.ggml.eos_token_id": 128009,
    "tokenizer.ggml.merges": [],            // populates if `verbose=true`
    "tokenizer.ggml.model": "gpt2",
    "tokenizer.ggml.pre": "llama-bpe",
    "tokenizer.ggml.token_type": [],        // populates if `verbose=true`
    "tokenizer.ggml.tokens": []             // populates if `verbose=true`
  }
}

if im not mistaken, the context_length can be acquired with the model family name.

jcollingj / caret

Add a base path for ollama #25