music-assistant / hass-music-assistant

Turn your Home Assistant instance into a jukebox, hassle free streaming of your favorite media to Home Assistant media players.
Apache License 2.0
1.39k stars 51 forks source link

AI is confused by the `query` ambiguity in the intent #2636

Closed HarvsG closed 3 months ago

HarvsG commented 4 months ago

What version of Music Assistant has the issue?

2.1.0

What version of the Home Assistant Integration have you got installed?

2024.6.2

Have you tried everything in the Troubleshooting FAQ and reviewed the Open and Closed Issues and Discussions to resolve this yourself?

The problem

Requests to the LLM to control media players are unreliable and slow. Debugging reveals that they often make multiple calls to the custom intent, with the first erroring, after using the query key. Later if it retries using the track or artist key it succeeds. If the same conversation agent is configured in MA then the calls to the API initiated by MA will confuse the conversation.

How to reproduce

Configure chat GPT, use the 3.5 model and ask it to play a track with debug logging turned on.

Music Providers

Spotify RadioBrowser

Player Providers

Cast devices via mass

Full log output

A portion of a log, showing GPT 3.5 turbo using the query key first

{'role': 'user', 'content': 'Play bad guy on the bedroom speakers.'}]
2024-07-16 09:20:19.634 DEBUG (MainThread) [homeassistant.components.openai_conversation] Response ChatCompletion(id='chatcmpl-9lXhyClZnkkIcEbGzQ2bSVNl0USaf', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_i7KBRuE3VpIU1ipzGaD3rWOO', function=Function(arguments='{"name":"Bedroom speaker","query":"bad guy"}', name='MassPlayMediaOnMediaPlayer'), type='function')]))], created=1721118018, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=25, prompt_tokens=3821, total_tokens=3846))
2024-07-16 09:20:19.635 DEBUG (MainThread) [homeassistant.components.openai_conversation] Tool call: MassPlayMediaOnMediaPlayer({'name': 'Bedroom speaker', 'query': 'bad guy'})
2024-07-16 09:20:19.640 DEBUG (MainThread) [homeassistant.components.openai_conversation] Tool response: {'error': 'IntentHandleError', 'error_text': 'query requires using a conversation agent'}
2024-07-16 09:20:20.941 DEBUG (MainThread) [homeassistant.components.openai_conversation] Response ChatCompletion(id='chatcmpl-9lXi07m08IM403IRdKxMfZi', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_nVr3Fniv0C6zWQiVJ8bO', function=Function(arguments='{"name":"Bedroom speaker","track":"bad guy"}', name='MassPlayMediaOnMediaPlayer'), type='function')]))], created=1721118020, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=25, prompt_tokens=3877, total_tokens=3902))
2024-07-16 09:20:20.942 DEBUG (MainThread) [homeassistant.components.openai_conversation] Tool call: MassPlayMediaOnMediaPlayer({'name': 'Bedroom speaker', 'track': 'bad guy'})
2024-07-16 09:20:32.149 DEBUG (MainThread) [homeassistant.components.openai_conversation] Tool response: {'speech': {'plain': {'speech': 'Okay', 'extra_data': None}}, 'response_type': 'action_done', 'data': {'targets': [], 'success': [], 'failed': []}}
2024-07-16 09:20:33.201 DEBUG (MainThread) [homeassistant.components.openai_conversation] Response ChatCompletion(id='chatcmpl-9lXiCkpyDP812g2DvSLWbXBISLYua', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='"Okay", "bad guy" will be played on the bedroom speakers.', role='assistant', function_call=None, tool_calls=None))], created=1721118032, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=16, prompt_tokens=3955, total_tokens=3971))

Additional information

https://github.com/music-assistant/hass-music-assistant/blob/12e06d27d8316feb5d78f67a46cac5771ede2d4c/custom_components/mass/intent.py#L64-L65)

What version of Home Assistant Core are your running

2024.7.2

What type of installation are you running?

Home Assistant OS

On what type of hardware are you running?

Raspberry Pi

OzGav commented 4 months ago

@tronikos @jozefKruszynski Is this a MA problem or HA?

HarvsG commented 4 months ago

To further clarify - if the conversation agent was configured image It looks like it wouldn't error - but it would call the API again?

HarvsG commented 4 months ago

Indeed, with that setting enabled we see and odd conversation that is a result of the ChatGPT using the query: key, that then forwards that query to itself without any of the media_player context which results in a confused response claiming not to know the device name. image

... #Initiated by me
{'role': 'user', 'content': 'Play coming up easy by Paolo nutini on the sideaboard speakers'}]
2024-07-16 14:17:37.460 DEBUG (MainThread) [homeassistant.components.openai_conversation] Response ChatCompletion(id='chatcmpl-9lcLgZmz1bhSN8438OXQUeWtxTO2i', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_1y6TQo16Mkafam9suAZopTDK', function=Function(arguments='{"name":"Sideboard Speakers","query":"Coming Up Easy","artist":"Paolo Nutini"}', name='MassPlayMediaOnMediaPlayer'), type='function')]))], created=1721135856, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=33, prompt_tokens=3371, total_tokens=3404))
2024-07-16 14:17:37.460 DEBUG (MainThread) [homeassistant.components.openai_conversation] Tool call: MassPlayMediaOnMediaPlayer({'name': 'Sideboard Speakers', 'query': 'Coming Up Easy', 'artist': 'Paolo Nutini'})
... #Initiated by MA due to query: field
{'role': 'user', 'content': 'Coming Up Easy'}]
2024-07-16 14:17:38.801 DEBUG (MainThread) [homeassistant.components.openai_conversation] Response ChatCompletion(id='chatcmpl-9lcLivP0J53MZgY7WdVqSH29tEJlN', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_Glm2dPDg9c9jfwCy8TFVQP4n', function=Function(arguments='{"name":"Whole House","query":"Coming Up Easy"}', name='MassPlayMediaOnMediaPlayer'), type='function')]))], created=1721135858, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=25, prompt_tokens=3360, total_tokens=3385))
2024-07-16 14:17:38.802 DEBUG (MainThread) [homeassistant.components.openai_conversation] Tool call: MassPlayMediaOnMediaPlayer({'name': 'Whole House', 'query': 'Coming Up Easy'})
... #Initiated by MA due to query: field
{'role': 'user', 'content': 'Coming Up Easy'}]
2024-07-16 14:17:39.853 DEBUG (MainThread) [homeassistant.components.openai_conversation] Response ChatCompletion(id='chatcmpl-9lcLj2ShszLiIk2fJswihbrJgnQv9', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_XD68uTSJxSCniAbshiiHGQz9', function=Function(arguments='{"query":"Coming Up Easy"}', name='MassPlayMediaOnMediaPlayer'), type='function')]))], created=1721135859, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=20, prompt_tokens=3360, total_tokens=3380))
2024-07-16 14:17:39.854 DEBUG (MainThread) [homeassistant.components.openai_conversation] Tool call: MassPlayMediaOnMediaPlayer({'query': 'Coming Up Easy'})
2024-07-16 14:17:39.857 DEBUG (MainThread) [homeassistant.components.openai_conversation] Tool response: {'error': 'IntentHandleError', 'error_text': 'Multiple entities matched for: name=None, area_name=None'}
2024-07-16 14:17:40.813 DEBUG (MainThread) [homeassistant.components.openai_conversation] Response ChatCompletion(id='chatcmpl-9lcLk8jLOIMMvuaG5wKDqistEoJTE', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I need more specific information like the area or device name to play the track "Coming Up Easy."', role='assistant', function_call=None, tool_calls=None))], created=1721135860, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=21, prompt_tokens=3416, total_tokens=3437))
2024-07-16 14:17:40.814 DEBUG (MainThread) [homeassistant.components.openai_conversation] Tool response: {'speech': {'plain': {'speech': 'I need more specific information like the area or device name to play the track "Coming Up Easy."', 'extra_data': None}}, 'response_type': 'partial_action_done', 'data': {'targets': [], 'success': [], 'failed': []}}
2024-07-16 14:17:41.898 DEBUG (MainThread) [homeassistant.components.openai_conversation] Response ChatCompletion(id='chatcmpl-9lcLlfKdWkaGQdAdfWHJHvy1IU3hF', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I need more specific information like the area or device name to play the track "Coming Up Easy."', role='assistant', function_call=None, tool_calls=None))], created=1721135861, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=21, prompt_tokens=3458, total_tokens=3479))
2024-07-16 14:17:41.899 DEBUG (MainThread) [homeassistant.components.openai_conversation] Tool response: {'speech': {'plain': {'speech': 'I need more specific information like the area or device name to play the track "Coming Up Easy."', 'extra_data': None}}, 'response_type': 'partial_action_done', 'data': {'targets': [], 'success': [], 'failed': []}}
2024-07-16 14:17:43.013 DEBUG (MainThread) [homeassistant.components.openai_conversation] Response ChatCompletion(id='chatcmpl-9lcLmtZEflecOCw1zCO8W1Afrb2Fl', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I need more specific information like the area or device name to play the track "Coming Up Easy."', role='assistant', function_call=None, tool_calls=None))], created=1721135862, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=21, prompt_tokens=3477, total_tokens=3498))
... #Initiated by me
{'role': 'user', 'content': 'Use the sideboard speakers'}]
2024-07-16 14:18:09.507 DEBUG (MainThread) [homeassistant.components.openai_conversation] Response ChatCompletion(id='chatcmpl-9lcMCcMhngzVvc29JVJgSm7kjpwjZ', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='call_QQJF8y9SJJPiwmcD5rOeTP0g', function=Function(arguments='{"name":"Sideboard Speakers","track":"Coming Up Easy","artist":"Paolo Nutini"}', name='MassPlayMediaOnMediaPlayer'), type='function')]))], created=1721135888, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=33, prompt_tokens=3510, total_tokens=3543))
2024-07-16 14:18:09.507 DEBUG (MainThread) [homeassistant.components.openai_conversation] Tool call: MassPlayMediaOnMediaPlayer({'name': 'Sideboard Speakers', 'track': 'Coming Up Easy', 'artist': 'Paolo Nutini'})
2024-07-16 14:18:10.115 DEBUG (MainThread) [homeassistant.components.openai_conversation] Tool response: {'speech': {'plain': {'speech': 'Okay', 'extra_data': None}}, 'response_type': 'action_done', 'data': {'targets': [], 'success': [], 'failed': []}}
2024-07-16 14:18:11.309 DEBUG (MainThread) [homeassistant.components.openai_conversation] Response ChatCompletion(id='chatcmpl-9lcMEu0bEpb6py9BHFVjjsaS2eVey', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Okay. "Coming Up Easy" by Paolo Nutini is now playing on the Sideboard speakers.', role='assistant', function_call=None, tool_calls=None))], created=1721135890, model='gpt-3.5-turbo-0125', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=21, prompt_tokens=3596, total_tokens=3617))

I'm currently working around this issue by adding this to my prompt.

When using the `MassPlayMediaOnMediaPlayer` tool do not use the `query` field, instead prefering `track`, `artist` etc. Always use your best guess as the entity name, avoiding `_`.
tronikos commented 4 months ago

From the documentation at https://music-assistant.io/integration/voice/?h=openai#ma-specific-conversation-agent to use the query: "You need to create/add another OpenAI integration that is purely for Music Assistant. Add the prompt found here when completing the configuration." We should link to it in the error or remove the query argument if there is no conversation agent configured. In the configure page we should also make it clear the agent is purely for search and link to the documentation.

OzGav commented 3 months ago

@HarvsG we are going to close this soon as the issue relates to the LLM trying to use the intent which was not setup. We have added more info to the docs to explain this and in the next update the error message will include a link to the docs so it can be fixed by the user. We are furthermore looking at whether we can head off this problem before the user even sees the problem.

@tronikos Is the mutiple queries reported something that should be looked at as a separate issue?

HarvsG commented 3 months ago

Should it not be possible to use an LLM assistant without having the MASS-specific LLM set up?

jozefKruszynski commented 3 months ago

Should it not be possible to use an LLM assistant without having the MASS-specific LLM set up?

Indeed, however, the initial integration with openai was written before the HA LLM was possible. The recent changes that allow the use of the HA LLM need some attention. Sadly none of us have any time to look at it currently as it is the summer holiday season. We'll get it fixed eventually, it just isn't really a priority.

Higher on the list is getting the MA integration into HA core.