yannikkellerde / AI-Snip

Bring clippy back to windows
23 stars 2 forks source link

Add Ollama support #3

Closed BearXP closed 1 hour ago

BearXP commented 3 weeks ago

Hello!

First I just wanted to say thank you for this, I saw your post on Reddit and this looks like such a fun project!

Looking through the code, it looks modular enough that it shouldn't be too hard to add support for a local model like Ollama.

pip install ollama

In util.py add the following:

import ollama
...
class OllamaModelWrapper(ModelWrapper):
    def __init__(self, model_name : str = "minicpm-v", log_file = None, api_version = ""):
        self.client = ollama.Client(
            host="http://localhost:11434"
        )
        super().__init__(self.client, model_name, log_file)

    def _ollama_reformat_messages(self, messages: list[dict[str, str]]) -> list[dict[str, str]]:
        ollama_messages = []
        for msg_raw in messages:
            msg = {m['type'] : m[m['type']] for m in msg_raw['content']}
            ollama_msg = {
                'role' : msg_raw['role'],
                'content' : msg['text']
            }
            if 'image_url' in msg:
                ollama_msg['images'] = [msg['image_url']['url'][22:]]
            ollama_messages.append(o_msg)
        return ollamao_messages

    def _openai_reformat_messages(self, ollama_msg: dict) -> dict:
        openai_msg = {
            "role" : ollama_msg["role"],
            "content" : {
                "type" : "text",
                "text": ollama_msg["content"]
            }
        }
        return openai_msg

    def complete(self, messages: dict[str, str], **kwargs) -> str:
        ollama_messages = self._ollama_reformat_messages(messages)
        response = self.client.chat(
            model=self.model_name,
            messages=ollama_messages,
            **kwargs
        )
        resp_msg = response['message']
        resp_content = resp_msg['content']
        self.stats["requests"] += 1
        self.stats["input_tokens"] += response['eval_count']
        self.stats["completion_tokens"] += response['eval_count']
        if self.log_file is not None:
            msg_copy = messages.copy()
            msg_copy.append(self._openai_reformat_messages(resp_msg))
            with open(self.log_file, "a") as f:
                f.write(json.dumps(msg_copy) + ",\n")
        return resp_content

    def stream_complete(self, messages: list[dict[str, str]], **kwargs) -> str:
        ollama_messages = self._ollama_reformat_messages(messages)
        stream = self.client.chat(
            model=self.model_name,
            messages=ollama_messages,
            stream=True,
            **kwargs
        )
        for chunk in stream:
            yield chunk['message']['content']

Then in aisnip.py add OllamaModelWrapper.

The hardest bit left would be getting the users input for where their ollama server is running (I've hard coded it to localhost), and which model they want to use (again I've just hard coded it to minicpm-v).

If you want I'm happy to setup an ollama_config.env or something and learn how to do a pull request?

yannikkellerde commented 3 weeks ago

Yeah, a configuration file (yaml or json) that users can modify to specify if they want to use GPT4-o or Ollama and which model / server they want to use sounds like the best option for now.

If you can create a pull request for this and provide some simple instructions on how to test it, then I'd be grateful and very happy to merge it in.

yannikkellerde commented 1 hour ago

Ended up implementing it myself 5ed1e9f6f00cb48a4288cf02b41db6670ba48213