ibm-granite-community / pm

Granite Community Project Management
0 stars 0 forks source link

Make Granite Code available via Ollama #23

Closed adampingel closed 2 months ago

deanwampler commented 4 months ago

It's already available. Do you mean newer models?

adampingel commented 4 months ago

Yes. Bigger models, newer versions, managed by the same process, and feeding the same BI/analytics reports.

adampingel commented 4 months ago

For instance, it's not immediately clear to me that ollama isn't missing the -instruct variants: https://ollama.com/library/granite-code

adampingel commented 3 months ago

Please note the scope of this ticket is to freshen up the Ollama granite code models once laboriously. There is a follow-on ticket for automation and hardening (which will possibly grow beyond a single ticket): https://github.com/ibm-granite-cookbooks/pm/issues/28

gabe-l-hart commented 3 months ago

Based on the pace of communication with the ollama folks, we've decided to go with a staging approach. I'm using my personal account (gabegoodhart). The 3b-128k and 8b-128k models are up now: https://ollama.com/gabegoodhart/granite-code:8b-128k

gabe-l-hart commented 3 months ago

For posterity, here's the script I'm using to do the imports:

import-to-ollama ```sh #!/usr/bin/env bash file_path="" model_name="" model_label="local" license_file="" quantization="Q4_K_M" workdir="" while [[ $# -gt 0 ]] do key="$1" case $key in -f|--file) file_path="$2" shift ;; -m|--model-name) model_name="$2" shift ;; -ml|--model-label) model_label="$2" shift ;; -l|--license-file) license_file="$2" shift ;; -q|--quantization) quantization="$2" shift ;; -w|--workdir) workdir="$2" shift ;; *) if [ "$file_path" != "" ] then echo "Unknown argument $1" exit 1 fi file_path="$1" shift ;; esac shift done if [ "$file_path" == "" ] then echo "Missing required argument -f|--file" exit 1 fi # If the model doesn't exist at all, try to download it from huggingface if ! [ -e "$file_path" ] then echo "Downloading model from huggingface..." save_path=$(basename $file_path) huggingface-cli download $file_path --local-dir $save_path file_path=$save_path fi # If the model is a directory, it's a raw transformers model and needs to be # converted and quantized if [ -d "$file_path" ] then echo "Converting raw transformers model to GGUF format..." llama-convert-hf-to-gguf.py "$file_path" echo "Quantizing GGUF Model [$quantization]" llama-quantize $file_path/ggml-model-f16.gguf $quantization quant_gguf="$file_path/$(basename $file_path).$quantization.gguf" mv "$file_path/ggml-model-$quantization.gguf" "$quant_gguf" file_path="$quant_gguf" fi # Use an absolute path file_path="$(realpath $file_path)" # Check if model_name is empty and assign file name as model_name if true if [ "$model_name" == "" ] then model_name=$(basename $file_path) model_name="${model_name%.*}" fi # Append the model label to the model name model_name="$model_name:$model_label" echo "model_name: $model_name" # Create a temporary directory for working if [ "$workdir" == "" ] then workdir=$(mktemp -d) fi mkdir -p $workdir 2>/dev/null echo "Working Dir: $workdir" # Write the file path to Modelfile in the temporary directory echo "FROM $file_path" > $workdir/Modelfile # If a license file is given, add it to the modelfile if [ "$license_file" != "" ] then license_text="$(cat $license_file)" echo "Adding LICENSE" echo -e "$license_text" | head -n3 echo "..." echo "LICENSE \"\"\"${license_text}\"\"\"" >> $workdir/Modelfile fi # Import the model using ollama create command echo "importing model $model_name" ollama create $model_name -f $workdir/Modelfile ```
gabe-l-hart commented 3 months ago

To do the push, you need to do the following:

  1. Set up your ollama account
  2. Add your "Ollama key" to your account
    • Settings -> Ollama keys
    • cat ~/.ollama/id_ed25519.pub | pbcopy
    • Add Ollama Public Key -> paste
      1. Copy the imported model to the correct name
        • ollama cp <import name> <username>/<model>:<tag>
      2. Push
        • ollama push <username>/<model>:<tag>
fayvor commented 2 months ago

Ollama does show considerable metadata about the model, along with the license and template: https://ollama.com/gabegoodhart/granite-code:8b-128k/blobs/a8fe02e5a50c

This includes the context length as llama.context_length.

gabe-l-hart commented 2 months ago

Wow, great find! This metadata seems to all be parsed from the source gguf file, so we may be able to embed additional metadata during the HF -> gguf conversion