ollama / ollama-python

Ollama Python library
https://ollama.com
MIT License
2.71k stars 223 forks source link

Can't get Async to stop #59

Closed grahama1970 closed 4 months ago

grahama1970 commented 4 months ago

the below results in an infinite number of new lines after the text retrurns. How do I give the async the stop command?

import ollama
from ollama import AsyncClient
import asyncio  
import json

async def chat():
  # print('from parent async')
  delta_list = []
  message = {'role': 'user', 'content': 'Why is the sky blue?'}
  response_count = 0
  async for part in await AsyncClient().chat(
        model='mistral', 
        messages=[message], 
        stream=True,
        format='json'
    ):
      # print('from loop async')
      delta_string =  str(part['message']['content']) or ''
      delta_list.append(delta_string)
      print(delta_string, end='', flush=True) # for debugging

      ###
      # After the whole string is done, it's infinite loop of empty new lines time!!

  # print('\n\n*****did I get here*****\n\n')
  json_str =  ''.join(delta_list)
  json_obj = json.loads(json_str)
  return json_obj

async def main():
    print('from main function')
    result = await chat()

if __name__ == '__main__':
  print('from file call')
  asyncio.run(main())
mxyng commented 4 months ago

the llm can generate forever in some scenarios. here are a few techniques to terminate:

  1. use options['stop']. models should come with stop parameters preconfigured but this might not match your specific output.

    chat(model=..., messages=..., options={'stop': ['This', 'will', 'stop', 'generation']})
  2. use options['num_predict']. this tells the llm to stop after a set number of tokens.

    chat(model=..., messages=..., options={'num_predict': 100})
  3. stop the python (async) generator. the llm will stop generation when the client connection exits. this means you can implement 1 or 2 from above client side, or implement your own termination criteria

[!NOTE] Using format=json without telling the LLM to output in JSON can create infinite loop. You'll find more success with a prompt like Why is the sky blue? Output in JSON format.