Would you consider adding more inference options for the LLM ? I/E select the GPU to run the llm inference on if multiple, or even CPU inference with gguf quants ?
edit: it already does cpu inference but seems like its dynamic based on the available vram, or if high vram is set to false
Hello,
Would you consider adding more inference options for the LLM ? I/E select the GPU to run the llm inference on if multiple, or even CPU inference with gguf quants ?
edit: it already does cpu inference but seems like its dynamic based on the available vram, or if high vram is set to false