API endpoint for quering information about a model

mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference

MIT License

22.74k stars 1.73k forks source link

Is your feature request related to a problem? Please describe. I have just started trying to find and configure some models to run on my 12GB 3060.

So i go and choose a model, in this case i choose localmodelsorca-mini-v2-13b-ggmlorca_mini_v2_13b.ggmlv3.q4_1.bin.yaml from the model gallery.

The next thing i need to do is set the gpu offloading because the model gallery has none configured by default. The documentation says i should be creating a configuration that identifies the number of layers that go onto the gpu. So the question is, how many layers does my newly downloaded model have? The Hugging face page doesn't say. so i have no idea. The local-ai documentation does provide an option https://localai.io/features/gpu-acceleration/ it suggests that i should turn on debug mode and then run the model.

Describe the solution you'd like provide an API to query information about the models. so that i can call that api after i download a model and use that information to configure it.

mudler / LocalAI

API endpoint for quering information about a model #1585