testcontainers / testcontainers-python

Testcontainers is a Python library that providing a friendly API to run Docker container. It is designed to create runtime environment to use during your automatic tests.
https://testcontainers-python.readthedocs.io/en/latest/
Apache License 2.0
1.44k stars 270 forks source link

New Container: OllamaContainer #617

Closed bricefotzo closed 3 weeks ago

bricefotzo commented 3 weeks ago

Add support for the OllamaContainer to simplify running and testing LLMs through Ollama.

What is the new container you'd like to have?

I would like to request support for a new container: OllamaContainer.

Why not just use a generic container for this?

The generic DockerContainer("ollama/ollama:latest") approach is not sufficient due to several reasons:

  1. Complicated setup/configuration: Ollama can run with GPU acceleration inside Docker containers for Nvidia GPUs. It's important to be able to check the availability of GPUs and run the container with some if possible.

  2. Model management: There is also the need to pull a model and commit container changes into an image after pulling a model. So that the image containing the model can be reused later.

alexanderankin commented 3 weeks ago

does it make sense to mount a volume instead of/as well as committing a container?

bricefotzo commented 3 weeks ago

@alexanderankin you raise a good point!

I asked myself the same question but as it's implemented with commit in the java and typescript versions I thought it's maybe because it's simpler and more robust to implement with commit.

Using commit, you don't need to identify all the files generated by ollama to make the volume binding where using volume you should specify all the needed files. The Ollama PATH is /root/.ollama but is there another locations affected by the pull of a model? I don't know, so having the doubt, the commit approach looks safer.

Plus, the commit approach ensures that the test environment is completely self-contained within the Docker image, which can be beneficial for reproducibility and portability across different test environments I guess?

I think @ilopezluna or @eddumelendez have a better answer to your question

alexanderankin commented 3 weeks ago

I have also realized that this is a more general concern which would also apply to java implementation. Looking at the various available images for ollama as well. one theory is that sizes vary based on number of gpu drivers available, so macos build is quite small - but then using this image on a mac would be different, so thinking about how to test this as well... this may be one of those images that only works on a linux machine, which may be okay.

alexanderankin commented 3 weeks ago

I added an ollama_dir option, which maybe should have been named ollama_home or something but it does seem to work (there is a test as well)

bricefotzo commented 3 weeks ago

Nice idea, so that both are possible! Thanks @alexanderankin ! Can't wait to try

alexanderankin commented 3 weeks ago

why wait:

mkdir testcontainers-python-617 && cd $_
python -m venv .venv && source $_/bin/activate
pip install git+https://github.com/testcontainers/testcontainers-python@main

script.py:

from json import loads
from pathlib import Path

from requests import post
from testcontainers.ollama import OllamaContainer

def split_by_line(generator):
    data = b''
    for each_item in generator:
        for line in each_item.splitlines(True):
            data += line
            if data.endswith((b'\r\r', b'\n\n', b'\r\n\r\n', b'\n')):
                yield from data.splitlines()
                data = b''
    if data:
        yield from data.splitlines()

with OllamaContainer(ollama_home=Path.home() / ".ollama") as ollama:
    if "llama3:latest" not in [e["name"] for e in ollama.list_models()]:
        print("did not find 'llama3:latest', pulling")
        ollama.pull_model("llama3:latest")
    endpoint = ollama.get_endpoint()
    for chunk in split_by_line(
            post(url=f"{endpoint}/api/chat", stream=True, json={
                "model": "llama3:latest",
                "messages": [{"role": "user", "content": "what color is the sky?"}]
            })
    ):
        print(loads(chunk)["message"]["content"], end="")