Pull/List Model Command

kevinthedang commented 3 months ago

Issue

When dealing with fresh docker containers and assuming say the use of no volumes, there is no Model within the Ollama container.
Rather than depending on the MODEL environment variable, we can allow for storage of what models are present in the container.
Note: We will need to account for the --rm case, or we could just not! Either way, default to removing the models when the container dies should be good as ollama always will pull the latest of a model. So it should be fine as long as it can pull from the "Ollama Model Library"

Solution

Investigate how we could remove the MODEL environment variable and instead just have some kind of way to store what models exists and remove them upon collapse of the container.
Users should be able to set what model they can use in their chat. For now, we can have a default then they have to change it.
This also means we have to have a way to automatically pull a model upon spin up of the application.
IMPORTANT: Likely the NodeJS scripts follow a pattern of discord then ollama container. Change that to the opposite so the ollama container can be ready prior to the bot.
If the user tries to pull nonsense, help them out by listing existing models.

Other Images

This is an image of a fresh discord and ollama container with no model set up.

kevinthedang commented 3 months ago

Notes:

Commands from Ollama documentation:

Pull

curl http://localhost:11434/api/pull -d '{
"name": "llama3"
}'

Whether or not stream is used, it will eventually send a

{
  "status": "success"
}

List

curl http://localhost:11434/api/tags

One response is generated

{
  "models": [
    {
      "name": "codellama:13b",
      "modified_at": "2023-11-04T14:56:49.277302595-07:00",
      "size": 7365960935,
      "digest": "9f438cb9cd581fc025612d27f7c1a6669ff83a8bb0ed86c94fcf4c5440555697",
      "details": {
        "format": "gguf",
        "family": "llama",
        "families": null,
        "parameter_size": "13B",
        "quantization_level": "Q4_0"
      }
    },
    {
      "name": "llama3:latest",
      "modified_at": "2023-12-07T09:32:18.757212583-08:00",
      "size": 3825819519,
      "digest": "fe938a131f40e6f6d40083c9f0f430a515233eb2edaa6d72eb85c50d64f2300e",
      "details": {
        "format": "gguf",
        "family": "llama",
        "families": null,
        "parameter_size": "7B",
        "quantization_level": "Q4_0"
      }
    }
  ]
}

Show

curl http://localhost:11434/api/show -d '{
  "name": "llama3"
}'

One response is given

{
  "modelfile": "# Modelfile generated by \"ollama show\"\n# To build a new Modelfile based on this one, replace the FROM line with:\n# FROM llava:latest\n\nFROM /Users/matt/.ollama/models/blobs/sha256:200765e1283640ffbd013184bf496e261032fa75b99498a9613be4e94d63ad52\nTEMPLATE \"\"\"{{ .System }}\nUSER: {{ .Prompt }}\nASSISTANT: \"\"\"\nPARAMETER num_ctx 4096\nPARAMETER stop \"\u003c/s\u003e\"\nPARAMETER stop \"USER:\"\nPARAMETER stop \"ASSISTANT:\"",
  "parameters": "num_ctx                        4096\nstop                           \u003c/s\u003e\nstop                           USER:\nstop                           ASSISTANT:",
  "template": "{{ .System }}\nUSER: {{ .Prompt }}\nASSISTANT: ",
  "details": {
    "format": "gguf",
    "family": "llama",
    "families": ["llama", "clip"],
    "parameter_size": "7B",
    "quantization_level": "Q4_0"
  }
}

Now we just need to look into the Discord.js commands for these as well (if any) @JT2M0L3Y

Resources

API Endpoints - Ollama

JT2M0L3Y commented 3 months ago

This will impact the potential use of personally contextualized models like what has been idealized in #22 and/or #45.

kevinthedang commented 3 months ago

How does this impact it much? We can create a command to create context for a LLM that a user wants to produce. These commands are to help reduce overhead from the user by entering a Ollama container and manually listing and pulling open source models.

This also allows users to view open source models to create their own LLMs from. Get the idea?

@JT2M0L3Y

JT2M0L3Y commented 3 months ago

So, should the list be of models already pulled or any model available through ollama (that is, if we're auto-pulling for model not already accessible in the container)?

JT2M0L3Y commented 3 months ago

My worry about the context: the current environment would require user-contextualized models to be pushed up to Ollama's model bank, right?

kevinthedang commented 3 months ago

To my knowledge, I believe they do not have to be pushed to the Model Library to be used. I believe you can just create them.

Yes, we are listing all open-source models that can be found in the Ollama Model Library.
The User should then be able to list what models are available (this should also show custom models in the future).
This will remove the MODEL environment variable as mentioned in #45.
Another thing to consider is a slash command that will create Modelfiles for Ollama to utilize when choosing a case specific LLM for a prompt. (Essentially a folder of Modelfile files to run other LLMs, kinda weird and might be too much)

Crazy Idea: Generating a Modelfile on the fly for a Prompt (will likely introduce too much overhead).

References

Customizing a Prompt via Modelfile
Models from Ollama Model Library

JT2M0L3Y commented 3 months ago

For reference, listing local models is already possible but "listing all models available with Ollama" has a number of open issues in the Ollama repository itself:

kevinthedang commented 3 months ago

Hmm alright. We'll do what we can for now then.

This feature will likely just be open until some kind of relevant API feature is implemented to deliver a .json or simpler way to read off the models found.

Is there any issue with pulling existing Models from the Library? If not, we can implement that first and leave this open as long as necessary.

@JT2M0L3Y

JT2M0L3Y commented 3 months ago

The most promising solution I can find at the moment is a kaggle dataset updated in the past month that has 87 different models. But, an API endpoint would be preferrable.

JT2M0L3Y commented 3 months ago

I think for now, it would be possible to query a local set of models to check that what was requested exists within the environment.

JT2M0L3Y commented 4 weeks ago

As of now, it looks like the Kaggle dataset created to resolve issue #1473 in the ollama repo is expected to be updated daily as new models are added.

As far as progress on an API endpoint for this, it looks like there is plenty of community desire for this feature but not too much progress on the implementation of such a feature.

We may have to wait awhile for this to be solved.

kevinthedang / discord-ollama