sammcj commented 7 months ago

This is a little more complicated as it will require creating an Ollama Modelfile / manifest in addition to linking the models.

lm-studio (mostly) parses the filename and the GGML/GGUF metadata to set it's parameters.
Ollama only uses that metadata when the model is loaded - it stores it's own 'manifest' of each model stored locally.

taliesins commented 7 months ago

I guess we would need to map parameters from this https://github.com/ollama/ollama/blob/main/docs/modelfile.md to default_lm_studio_windows.preset.json

Ollama

The files we probably need to understand:

https://github.com/ollama/ollama/blob/main/cmd/cmd.go#L118 (files we need to create with each model?)
https://github.com/ollama/ollama/blob/main/cmd/cmd.go#L205 (how to name the model files?)

Once we have import right we should be able to do the following:

For Gemma ollama show gemma:latest --modelfile:

# Modelfile generated by "ollama show"
# To build a new Modelfile based on this one, replace the FROM line with:
# FROM gemma:latest

FROM /usr/share/ollama/.ollama/models/blobs/sha256-456402914e838a953e0cf80caa6adbe75383d9e63584a964f504a7bbb8f7aad9
TEMPLATE """<start_of_turn>user
{{ if .System }}{{ .System }} {{ end }}{{ .Prompt }}<end_of_turn>
<start_of_turn>model
{{ .Response }}<end_of_turn>
"""
PARAMETER repeat_penalty 1
PARAMETER stop "<start_of_turn>"
PARAMETER stop "<end_of_turn>"

LM Studio

LM-Studio uses this file C:\Users\Administrator\.cache\lm-studio\config-presets\config.map.json to map metadata to models.

{
  "_formatVersion": "0.0.2",
  "preset_map": {
    "*/Llama-2-*B-*hat-*/*.*": "metaai_llama_2_chat.preset.json",
    "*/vicuna-*B-v*-16K-*/*.*": "vicuna_v1_5_16k.preset.json",
    "*/CodeLlama-*B-Python-*/*.*": "codellama_instruct.preset.json",
    "*/CodeLlama-*B-Instruct-*/*.*": "codellama_instruct.preset.json",
    "*/Phind-CodeLlama-34B-v*-GGUF/*/*.*": "phind_codellama.preset.json",
    "*/*istral-*B-Instruct-v*-GGUF/*.gguf": "mistral_instruct.preset.json",
    "*/OpenHermes-2*-Mistral-7B-GGUF/*.gguf": "chatml.preset.json",
    "*/dolphin-*-mistral-*B-GGUF/*.gguf": "chatml.preset.json",
    "*/*istral-*B-OpenOrca-GGUF/*.gguf": "chatml.preset.json",
    "*/zephyr-*-GGUF/*.gguf": "zephyr.preset.json",
    "*/stablelm-zephyr-*-GGUF/*.gguf": "zephyr.preset.json",
    "*/deepseek-*-instruct-GGUF/*.gguf": "deepseek_coder.preset.json",
    "*/Mixtral-8x7B-Instruct-v*-GGUF/*.gguf": "mistral_instruct.preset.json",
    "*/phi-2-GGUF/*.gguf": "phi_2.preset.json",
    "*/Qwen1.5-*-Chat-GGUF/*.gguf": "chatml.preset.json",
    "*/gemma-*b-it/*.gguf": "google_gemma_instruct.preset.json",
    "*/gemma-*b-it-GGUF/*.gguf": "google_gemma_instruct.preset.json",
    "*/Hermes-2-Pro-Mistral-7B-GGUF/*.gguf": "chatml.preset.json",
    "*/c4ai-command-r-v01-GGUF/*.gguf": "cohere_command_r.preset.json",
    "*/stable-code-instruct-3b*/*.gguf": "chatml.preset.json",
    "*/Starling-LM-7B-beta-GGUF/*.gguf": "openchat.preset.json"
  },
  "user_preset_map": {
    "TheBloke/dolphin-2.5-mixtral-8x7b-GGUF/dolphin-2.5-mixtral-8x7b.Q5_K_M.gguf": "mistral_instruct.preset.json",
    "TheBloke/dolphin-2.5-mixtral-8x7b-GGUF/dolphin-2.5-mixtral-8x7b.Q8_0.gguf": "mistral_instruct.preset.json",
    "mradermacher/Mixtral_AI_Cyber_MegaMind_3_0-GGUF/Mixtral_AI_Cyber_MegaMind_3_0.Q4_K_S.gguf": "mistral_instruct.preset.json",
    "qwp4w3hyb/deepseek-coder-7b-instruct-v1.5-iMat-GGUF/deepseek-coder-7b-instruct-v1.5-imat-Q8_0.gguf": "deepseek_coder.preset.json",
    "TheBloke/CodeFuse-CodeLlama-34B-GGUF/codefuse-codellama-34b.Q4_K_M.gguf": "codellama_instruct.preset.json",
    "dranger003/OpenCodeInterpreter-CL-70B-iMat.GGUF/ggml-opencodeinterpreter-cl-70b-iq2_xs.gguf": "OpenCodeInterpreter.preset.json"
  }
}

Look at default set of configuration default_lm_studio_windows.preset.json

{
  "name": "Default LM Studio Windows",
  "load_params": {
    "n_ctx": 2048,
    "n_batch": 512,
    "rope_freq_base": 10000,
    "rope_freq_scale": 1,
    "n_gpu_layers": 0,
    "use_mlock": true,
    "main_gpu": 0,
    "tensor_split": [
      0
    ],
    "seed": -1,
    "f16_kv": true,
    "use_mmap": true
  },
  "inference_params": {
    "n_threads": 4,
    "n_predict": -1,
    "top_k": 40,
    "top_p": 0.95,
    "temp": 0.8,
    "repeat_penalty": 1.1,
    "input_prefix": "### Instruction:\n",
    "input_suffix": "\n### Response:\n",
    "antiprompt": [
      "### Instruction:"
    ],
    "pre_prompt": "Below is an instruction that describes a task. Write a response that appropriately completes the request.",
    "pre_prompt_suffix": "\n",
    "pre_prompt_prefix": "",
    "seed": -1,
    "tfs_z": 1,
    "typical_p": 1,
    "repeat_last_n": 64,
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "n_keep": 0,
    "logit_bias": {},
    "mirostat": 0,
    "mirostat_tau": 5,
    "mirostat_eta": 0.1,
    "memory_f16": true,
    "multiline_input": false,
    "penalize_nl": true
  }
}

For Gemma google_gemma_instruct.preset.json:

{
  "name": "Google Gemma Instruct",
  "inference_params": {
    "input_prefix": "<start_of_turn>user\n",
    "input_suffix": "<end_of_turn>\n<start_of_turn>model\n",
    "antiprompt": [
      "<start_of_turn>user",
      "<start_of_turn>model",
      "<end_of_turn>"
    ],
    "pre_prompt": "",
    "pre_prompt_prefix": "",
    "pre_prompt_suffix": ""
  },
  "load_params": {
    "rope_freq_scale": 0,
    "rope_freq_base": 0
  }
}

GitHub
ollama/docs/modelfile.md at main · ollama/ollama
Get up and running with Llama 2, Mistral, Gemma, and other large language models. - ollama/ollama

GitHub
ollama/cmd/cmd.go at main · ollama/ollama
Get up and running with Llama 2, Mistral, Gemma, and other large language models. - ollama/ollama

sammcj commented 7 months ago

Thanks for the research @taliesins!

I've been away this past week but I hope to look into hacking this up this week.

sammcj / llamalink

Feature: two way syncing / lm-studio -> ollama syncing #5

Ollama

LM Studio