rasbt / LLMs-from-scratch

Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
https://www.manning.com/books/build-a-large-language-model-from-scratch
Other
20.85k stars 2.09k forks source link

ch07 - ollama reproducibility #249

Closed d-kleine closed 2 days ago

d-kleine commented 2 days ago

Bug description

@rasbt I think I have found the issue why the Ollama API does not generate deterministic output, this change in the code should solve it:

def query_model(prompt, model="llama3", url="http://localhost:11434/api/chat"):
...
        "options": {  # new
            "seed": 123,
            "temperature": 0
        }
    }

...

I have taken a look into the Ollama API docs, it seems like you need to pass those params into a separate options key in the json input.

It's important to set "num_ctx" (number of tokens for the context window) too because this will make sure that the output will be 100% reproducible, otherwise it will be slightly random. I have also added "num_ctx": 2048 for a fixed context window size according the the model params docs - the output of this code should be fully reproducible:

import urllib.request
import json

def query_model(prompt, model="llama3", url="http://localhost:11434/api/chat"):
    # Create the data payload as a dictionary
    data = {
        "model": model,
        "messages": [
            {
                "role": "user",
                "content": prompt
            }
        ],
        "options": {  # new
            "seed": 123,
            "temperature": 0,
            "num_ctx": 2048 # must be set, otherwise slightly random output
        }
    }

    # Convert the dictionary to a JSON formatted string and encode it to bytes
    payload = json.dumps(data).encode("utf-8")

    # Create a request object, setting the method to POST and adding necessary headers
    request = urllib.request.Request(url, data=payload, method="POST")
    request.add_header("Content-Type", "application/json")

    # Send the request and capture the response
    response_data = ""
    with urllib.request.urlopen(request) as response:
        # Read and decode the response
        while True:
            line = response.readline().decode("utf-8")
            if not line:
                break
            response_json = json.loads(line)
            response_data += response_json["message"]["content"]

    return response_data

result = query_model("What do Llamas eat?")
print(result)

My output is deterministic, so should be reproducible for you too:

Llamas are herbivores, which means they primarily feed on plant-based foods. Their diet typically consists of:

1. Grasses: Llamas love to graze on various types of grasses, including tall grasses, short grasses, and even weeds.
2. Hay: High-quality hay, such as alfalfa or timothy hay, is a staple in a llama's diet. They enjoy munching on hay cubes or loose hay.
3. Grains: Llamas may receive grains like oats, barley, or corn as part of their diet. However, these should be given in moderation to avoid digestive issues.
4. Fruits and vegetables: Fresh fruits and veggies can be a tasty treat for llamas. Some favorites include apples, carrots, sweet potatoes, and leafy greens like kale or spinach.
5. Minerals: Llamas need access to mineral supplements, such as salt licks or loose minerals, to ensure they're getting the necessary nutrients.

In the wild, llamas might also eat:

1. Leaves: They'll munch on leaves from trees and shrubs, like willow or cedar.
2. Bark: In some cases, llamas may eat the bark of certain trees, like aspen or birch.
3. Mosses: Llamas have been known to graze on mosses and other non-woody plant material.

It's essential to provide a balanced diet for your llama, taking into account their age, size, and individual needs. Consult with a veterinarian or experienced llama breeder to determine the best feeding plan for your llama.

And later in the notebook, the evaluation scores should be of course then reproducible too:

Scoring entries: 100%|██████████| 100/100 [03:54<00:00,  2.34s/it]

model 1 response
Number of scores: 100 of 100
Average score: 78.70

Scoring entries: 100%|██████████| 100/100 [03:48<00:00,  2.28s/it]

model 2 response
Number of scores: 99 of 100
Average score: 65.47

What operating system are you using?

Windows

Where do you run your code?

Local (laptop, desktop)

Environment

rasbt commented 2 days ago

Oh wow, thanks so much for figuring this out. I tried lots of things but somehow didn't think of this. It's kind of weird that Ollama doesn't error if the options are passed differently (but then silently ignores them). In any case, I can confirm that the responses are now deterministic. But it still seems they are not deterministic across operating systems (but that's ok).

d-kleine commented 2 days ago

Oh wow, thanks so much for figuring this out. I tried lots of things but somehow didn't think of this.

Tbh, I am really happy that the model is deterministic now, so the same evaluation scores also differ less than before 🙂

It's kind of weird that Ollama doesn't error if the options are passed differently (but then silently ignores them).

Yeah, I was thinking the same...

In any case, I can confirm that the responses are now deterministic. But it still seems they are not deterministic across operating systems (but that's ok).

Yeah, I can confirm that. I have tested it with Windows 10 and with my Ubuntu image on Docker, the generated output on the same OS is deterministic and reproducible, but across different OS it is inconsistent. This also seems to when restarting the kernel. My assumption is that this is not an issue of the model itself, but rather one in Ollama (probably even llama.cpp in the backend).

I have opened an GH issue on this: https://github.com/ollama/ollama/issues/5321

d-kleine commented 1 day ago

Thanks for updating the code!