Open incredibleole opened 1 week ago
What's the tok/s you get on the 48GB M4?
It is likely a prompt template issue for gguf issue, we will investigate
I get 9-10 tokens/sec. I also tried thebloke-kafkaLM-70b, this doesn't output anything at all and hangs an 1 Token/sec
When using the 70b Llama-Model it just generates Garbage random Characters as output, this seems to be the case with other 70b-Models as well. Using macbook pro M4 Pro with 48gb of RAM.
Your Environment