Open remon-nashid opened 4 days ago
Hi @remon-nashid, thanks for the feature request. For reference, these values aren't actually returned from the Ollama API, they are found by the CLI/client. You can see how the Ollama CLI does it in Go here: https://github.com/ollama/ollama/blob/723f285813f504375f0e6be6c76edfbaaabd961f/cmd/cmd.go#L670
Besides returning the list response, can it specify the gpu/cpu percentages? Figuring out how much of the model is loaded into GPU is not as clear cut as dividing the
size_vram
by vram size.