Change the API base URL to the local interference API

base_url="http://localhost:1337/v1"

)

Would be similar to what already provided with ollama

vanpelt commented 7 months ago

Would like to hear more about your usecase. If you want to mess around locally, you'd just change this line. That's still going to pass gpt-3.5-turbo etc as a model name, to make this work generically we would need a uniform way to get a list of what models are available. This is essentially what I'm doing with the Ollama integration.

I've been thinking about adding support for tools like Replicated or Together.ai which would make using open source models much simpler / faster. Are you just running a lammacpp model independent of ollama?

vale46n1 commented 7 months ago

I'm using StudioLM for same test. openrouter.ai is another good and cheap alternative to be use (and I would say to be integrated)

MMoneer commented 7 months ago

to make this work generically we would need a uniform way to get a list of what models are available.

LMstudio server works when there is a model already uploaded, so it's not like the Ollama server can run without a model, we just need a connection, and the user can change the model from the Lmstudio or Ooba, etc, I changed the base_url value to match Lmstudio but still connect to Olama .

grigio commented 7 months ago

which is the best local LLM to run openui?

MMoneer commented 7 months ago

which is the best local LLM to run openui?

@vanpelt mentioned LLava, So try one of the V1.6 7B,13B, 34B

Highlyhotgames commented 6 months ago

Would like to hear more about your usecase. If you want to mess around locally, you'd just change this line. That's still going to pass gpt-3.5-turbo etc as a model name, to make this work generically we would need a uniform way to get a list of what models are available. This is essentially what I'm doing with the Ollama integration.

I've been thinking about adding support for tools like Replicated or Together.ai which would make using open source models much simpler / faster. Are you just running a lammacpp model independent of ollama?

According to LMStudio official tech doc [https://lmstudio.ai/docs/local-server]: Check which models are currently loaded

curl http://localhost:1234/v1/models

Response (following OpenAI's format)

{
  "data": [
    {
      "id": "TheBloke/phi-2-GGUF/phi-2.Q4_K_S.gguf",
      "object": "model",
      "owned_by": "organization-owner",
      "permission": [
        {}
      ]
    },
    {
      "id": "lmstudio-ai/gemma-2b-it-GGUF/gemma-2b-it-q4_k_m.gguf",
      "object": "model",
      "owned_by": "organization-owner",
      "permission": [
        {}
      ]
    }
  ],
  "object": "list"
}%

In this case both TheBloke/phi-2-GGUF and lmstudio-ai/gemma-2b-it-GGUF are loaded

wandb / openui

Use local API as LLM #39

Change the API base URL to the local interference API