Open jalajthanaki opened 10 months ago
very cool, thx for sharing
System prompts have huge effects on the response quality, so could be it
About the throughput: https://anakin.ai/blog/how-to-run-mixtral-8x7b-locally/#specs-you-need-to-run-mixtral-8x7b-locally, but i guess 6-7 token/sec should be okay when streaming the response. The average reading speed of person is around 200 wpm
Here how I made Mixtral-8x7B-Instruct-v0.1 work using FastChat.vllm_worker.
Still I find the answers from the chat arena's Mixtral-8x7B-Instruct-v0.1 much better.
Answer from local inference.
Answer from the Chat Arena.
Still have few questions
Is there anything that I'm still missing?