Open thistleknot opened 12 months ago
batch inference
I've been chasing gguf batch inference down, and apparently not supported in ctransformers, llama.cpp, nor llama-cpp-python
Why?
batch inference