Can't get Async to stop

ollama / ollama-python

Ollama Python library

MIT License

2.71k stars 223 forks source link

import ollama from ollama import AsyncClient import asyncio import json async def chat(): # print('from parent async') delta_list = [] message = {'role': 'user', 'content': 'Why is the sky blue?'} response_count = 0 async for part in await AsyncClient().chat( model='mistral', messages=[message], stream=True, format='json' ): # print('from loop async') delta_string = str(part['message']['content']) or '' delta_list.append(delta_string) print(delta_string, end='', flush=True) # for debugging ### # After the whole string is done, it's infinite loop of empty new lines time!! # print('\n\n*****did I get here*****\n\n') json_str = ''.join(delta_list) json_obj = json.loads(json_str) return json_obj async def main(): print('from main function') result = await chat() if __name__ == '__main__': print('from file call') asyncio.run(main())

the llm can generate forever in some scenarios. here are a few techniques to terminate:

use options['stop']. models should come with stop parameters preconfigured but this might not match your specific output.
```
chat(model=..., messages=..., options={'stop': ['This', 'will', 'stop', 'generation']})
```
use options['num_predict']. this tells the llm to stop after a set number of tokens.
```
chat(model=..., messages=..., options={'num_predict': 100})
```
stop the python (async) generator. the llm will stop generation when the client connection exits. this means you can implement 1 or 2 from above client side, or implement your own termination criteria

[!NOTE] Using format=json without telling the LLM to output in JSON can create infinite loop. You'll find more success with a prompt like Why is the sky blue? Output in JSON format.

ollama / ollama-python

Can't get Async to stop #59