jekalmin / extended_openai_conversation

Home Assistant custom component of conversation agent. It uses OpenAI to control your devices.
834 stars 108 forks source link

Logs contain illegal JSON syntax #155

Open mateuszdrab opened 4 months ago

mateuszdrab commented 4 months ago

The logs produced by the extension provide illegal syntax JSON, field names must be double quote, not single quote.

As a result, the extracted JSON won't parse in Grafana/Loki.

Example:

2024-02-22 11:23:54.872 INFO (MainThread) [custom_components.extended_openai_conversation] Response for test: {'id': 'chatcmpl-8v1T8OatUuJzoOx7oJVTgUZ2oCMKV', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': 'The ventilation is not recommended at the moment.', 'role': 'assistant'}, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'created': 1708601034, 'model': 'gpt-35-turbo', 'object': 'chat.completion', 'system_fingerprint': 'fp_68a7d165bf', 'usage': {'completion_tokens': 9, 'prompt_tokens': 12510, 'total_tokens': 12519}, 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}]}

I take it that the JSON comes directly from the API, if so, is that perhaps an issue with the OpenAI package?

P.S - what does "Response for test" mean, where does test come from?

mateuszdrab commented 4 months ago

I figured it out.

Changed lines: https://github.com/jekalmin/extended_openai_conversation/blob/1b20b56e81e5e5067b72a2ba2c8f51dd0a73eef1/custom_components/extended_openai_conversation/__init__.py#L359 to _LOGGER.info("Prompt for %s: %s", model, json.dumps(messages))

https://github.com/jekalmin/extended_openai_conversation/blob/1b20b56e81e5e5067b72a2ba2c8f51dd0a73eef1/custom_components/extended_openai_conversation/__init__.py#L371 to _LOGGER.info("Response %s", response.model_dump(exclude_none=True))

Would you mind merging it if I create a PR?

jekalmin commented 4 months ago

Thanks for reporting an issue! Log is now only used for monitoring, so you can change its format to JSON.

However, I just want to know that if it's changed, how does it parse JSON if log format is like Prompt for gpt-3.5-turbo-0125: {JSON}?

Can you use expression to parse JSON from string, which contains JSON, in Grafana/Loki? (I haven't used Grafana/Loki before)

Also, did you mean this for the second case? _LOGGER.info("Response %s", json.dumps(response.model_dump(exclude_none=True)))

mateuszdrab commented 4 months ago

Thanks for reporting an issue! Log is now only used for monitoring, so you can change its format to JSON.

However, I just want to know that if it's changed, how does it parse JSON if log format is like Prompt for gpt-3.5-turbo-0125: {JSON}?

Can you use expression to parse JSON from string, which contains JSON, in Grafana/Loki? (I haven't used Grafana/Loki before)

Also, did you mean this for the second case? _LOGGER.info("Response %s", json.dumps(response.model_dump(exclude_none=True)))

Hey @jekalmin

Thanks for getting back to me

Yeah so when I parse the log in Loki I use a regex to process the docker timestamp, the response/prompt for string until the colon and extract the rest as a label which I then set as main message content and parse that into JSON.

It works, the only issue I'm currently investigating is that my prompt is very long at 43k chars and gets cut off after 16k chars when set as a label. Might be a Loki limitation which can probably be worked around.

And yes, the second example I pasted the wrong line but it is exactly what you suggested.

The idea would be to create a dashboard to show prompt+response as well as some stats on token usage.

jekalmin commented 4 months ago

Thanks for a quick response.

I haven't thought that log can be used as this way (which is really good to track history).

It would be a great feature to support! (although it adds a limitation to log format)

I would be happy to change its format since both are readable as well for humans (at least for me). If you don't mind, could you create a PR?

mateuszdrab commented 4 months ago

Loki is amazing, I prefer to ingest raw logs without any modifications on source side so that they can be parsed however I need at query time.

The PR is ready ;)

To be honest, structured logging is quite common - though that would require replacement of all logs with JSON equivalents.