microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
31.42k stars 4.58k forks source link

Inconsistent dictionary structure in autogen.ChatCompletion.logged_history #412

Open victordibia opened 11 months ago

victordibia commented 11 months ago

179 added much needed support for tracking token count and cost (thanks @kevin666aa ).

However, there is some unexpected/inconsistent structure in the dictionary returned.

Currently autogen.ChatCompletion.start_logging(compact=True) is used to start a logging session and ends with autogen.ChatCompletion.stop_logging(). Next the logs can be accessed via autogen.ChatCompletion.logged_history.

Unexpected Structure in autogen.ChatCompletion.logged_history when compact=True

When compact is set to True, logged_history is a dictionary. However, the key is the entire chat history


{
    """
    [
        {
            'role': 'system',
            'content': system_message,
        }, ...
    ]""": {
        "created_at": [0, 1],
        "cost": [0.1, 0.2],
    }
}

This makes it very challenging to reuse this data structure in apps. It might be valuable to have output with some structured keys.

{
   "messages": [..]
   "created_at": []
   "cost": ...
}

Further more, the structure of logged_history is significantly different when compact=False

{0: {'request': {'messages': [{'content': 'Y..  ',
     'role': 'system'},
    ],
   'model': 'gpt-4',
   'temperature': 0,
   'api_key': '..'},
  'response': {'id': 'chatcmpl-...Y',
   'object': 'chat.completion',
   'created': 1698203546,
   'model': 'gpt-4-0613',
   'choices': [{'index': 0,
     'message': {'role': 'assistant',
      'content': 'Yes,..'},
     'finish_reason': 'stop'}],
   'usage': {'prompt_tokens': 689,
    'completion_tokens': 65,
    'total_tokens': 754},
   'cost': 0.024569999999999998}}}

Potential action items

Happy to get more thoughts here @gagb @afourney @pcdeadeasy

Related .

Documentation here may need an update.

afourney commented 11 months ago

These are great observations. I've mainly only been using compact=False in the Testbed, since the intention is to log as much as is possible.

Even in verbose logging, it's weird to have a dictionary instead of a list, unless we're expecting it to be sparse at some point? Basically it means we need to be a little careful when iterating through the items... there's no guarantee they will be sequential in this structure.

pcdeadeasy commented 11 months ago

@victordibia, all your observations are true. It will be good to define/generate a JSON Schema for logging messages and use it consistently for messages. Having the whole message as a key is indeed an unusual choice. So, I agree with your proposal to make this consistent irrespective of the flag.

yiranwu0 commented 11 months ago

Yes, the dictionary returned with compact = True or False is quite different, and I did spend some time to make them consistent to and the func print_usage_summary. It would be great if there is a consistent and easy-to-use structure.

victordibia commented 11 months ago

Thanks @kevin666aa . It looks like the new changes driven by updates to the openai lib will be relevant here #203, #7 . I will revisit this when #203 is complete.

mclassen commented 10 months ago

I managed to find a work around that seems to work for me, e.g.:

    conversations = dict()
    autogen.ChatCompletion.start_logging(conversations)

    # ... do AutoGen stuff...

    # extract content from an entry (in the conversations list):
    get_content_list = lambda conversations: list(conversations)[0]
    content_list = get_content_list(conversations)

    content_list_json = json.loads(content_list)

    # get content of all conversations:
    conversations_content = "\n".join(conversation.get('content') for conversation in content_list_json)

Quite crazy having to do this kind of stuff... I hope it will be fixed soon!