ollama / ollama-python

Ollama Python library
https://ollama.com
MIT License
2.71k stars 223 forks source link

Why Ollama is so terribly slow when I set format="json" #92

Closed eliranwong closed 3 months ago

eliranwong commented 3 months ago

When I use format="json" the speed is extremely slow. However, I just tried llamafile with JSON output with the same prompt. What takes Ollama to response in two minutes, takes llamafile of the same model a few seconds. Please advise, if this issue is not to be sorted, obviously Ollama is not a suitable choice for developing applications that need JSON output. I really like Ollama as it is easy to be set up.

            completion = ollama.chat(
                model="mistral",
                messages=messages,
                format="json",
                options=Options(
                    temperature=0.0,
                    num_ctx=100000,
                    num_predict=-1,
                ),
eliranwong commented 3 months ago

I got response at https://github.com/ollama/ollama/issues/3154