Assist returning JSON, but not doing anything.

Someguitarist commented 2 weeks ago

What version of Music Assistant has the issue?

2.2.1

What version of the Home Assistant Integration have you got installed?

2024.8.2

Have you tried everything in the Troubleshooting FAQ and reviewed the Open and Closed Issues and Discussions to resolve this yourself?

[X] Yes

The problem

I was following the directions for voice assistant stuff here https://music-assistant.io/integration/voice/ and everything seems to work well, except Assist is returning JSON without acting on it. Like if I ask it to play 90's it returns {"media_id": ["Nirvana - Smells Like Teen Spirit", "Radiohead - Creep", "Oasis - Wonderwall", "Foo Fighters - Everlong", "Weezer - Buddy Holly"], "media_type":"track"}

but no music is played. I'm currently using home-llm listed here https://github.com/acon96/home-llm where everything seems to work except for this. Any ideas?

How to reproduce

Use Ollama, Llama 3.1, and home-llm to control house locally. Add the prompts and the yaml in custom_sentences\en. Ask to play music, get raw JSON out.

Music Providers

Spotify

Player Providers

Home Assistant and Snapcast

Full log output

No relevant logs, as it doesn't seem to ping Music Assistant at all. Just returns raw JSON.

Additional information

No response

What version of Home Assistant Core are your running

2024.8.2

What type of installation are you running?

Home Assistant Container

On what type of hardware are you running?

Linux

OzGav commented 2 weeks ago

This really should be a discussion Q&A as this is not a problem with MA. Having said that it sounds like you haven’t set this up properly. Show a screenshot of the HA integration configuration screen and a screenshot of the voice assistant you have setup

Someguitarist commented 2 weeks ago

Sorry, and this can totally be closed if it's beyond the scope of this project. I was just trying to incorporate the home-llm with MA for Voice Assist. It looks like it's so close, just for some reason it's not parsing the JSON, rather passing it through the voice pipeline.

OzGav commented 2 weeks ago

In your Jarvis voice assistant what conversation agent is configured there? Returning the json usually means that is incorrect

Someguitarist commented 2 weeks ago

In my Jarvis voice assistant the JarvisLLM linked above is configured. My only thought process is that since it's using Ollama/Llama3.1 it's probably using tools? Either that or it wants me to format the JSON differently?

It's weird, because it can do everything else like control the lights and stuff correctly. Just having issues with controlling the music.

Also, thanks again for your help. I realize this is like a really fringe issue.

OzGav commented 2 weeks ago

I am no expert on this but I think the issue is you are combining the music discovery and the control in one agent.

Since you have opted to specify a MA specific conversation agent then it should be SPECIFIC. This agent creates the JSON and passes it back for use by MA. Then you need to ensure you have done all of the steps here https://music-assistant.io/integration/voice/#ma-specific-conversation-agent

As I understand it the LLM should be aware of the MassPlayMediaOnMediaPlayer intent and use that when the JSON is returned.

I don't use a local LLM so you will need to ask in the HA forums for more help with that.

So in summary what normally is configured:

You create a service in the LLM integration which takes a query and converts it into a JSON that MA understands. Lets call this LLM MUSIC
In the MA Integration config you set LLM MUSIC as the MA specific conversation agent.
You set up a voice assistant to use your general purpose conversation agent. I use "Home Assistant" but you will use "Jarvis". The issue for you is making sure Jarvis knows what to do.

Someguitarist commented 2 weeks ago

It's okay, I'm no expert either. Yeah, I've followed all of the steps except 'make it specific' because I'd basically be giving up control of the house for control of the music, and I'd really need both to get rid of the Alexa's.

The fact that it's spitting out what looks like the correct JSON to me when asked to play music and everything else still works makes me think that I'm close though.

Quick question for you; did you have to do any edits to your configuration.yaml? Looking at the Home Assistant section for custom intents (https://www.home-assistant.io/voice_control/custom_sentences/) I see that they mentioned creating a .yaml in the custom_sentences/en directory like we do here for MA, but right below that they they say you can use the custom intent script integration to implement an action and provide a response by editing the configuration.yaml, but I don't see that in any of the setup documentation for this.

Am I going down the wrong rabbit hole here you think? Did you have to include the intent_script in your configuration.yaml?

Again, thanks for your time.

OzGav commented 2 weeks ago

Why do you say

I'd basically be giving up control of the house for control of the music

No you don’t need the intent in configuration.yaml that is done by the integration

Someguitarist commented 2 weeks ago

Ah okay. Good to eliminate that possibility.

I say that because I can only have one voice assistant as default. So the Wyoming Sattelites I have will only use the star'd (default) Voice Assistant, which is fed in a prompt about the devices and their current states and whatnot. If I set up a Music Assistant specific Voice Assistant, then set that as default, it won't have the information about the devices.

That's why I was trying to append the prompt for Music Assistant to my current prompt, so that it had the device information but also knew how to format prompts for Music Assistant.

All that being said it's probably something that I did, but disabling everything I've got going and just following the Music Assistant instructions above using local Ollama integration and Home Assistant Assist it still seems to just return JSON for me?

I'll be able to tinker with it a bit more tomorrow and see if it's something else I did, but going with all default stuff and just following the instructions line by line I'm still having the same problem.

OzGav commented 2 weeks ago

No you don't need to setup a dedicated voice assistant for MA (you can for testing but that is a different conversation). I said set up a specific LLM agent not voice assistant.

But I don't think you are following the instructions as you have ONE LLM integration when you need two. I think everything is working fine for what you have done. The bottom section of your LLM prompt tells OLLAMA what to do when it gets a music request and that is to return a JSON string which is exactly what it is doing. What you haven't done is told it what to do with that string?

This is where I get sketchy. I understand that normally when people set these things up that the LLM is aware of all of the available service calls and intents. So it works out what to do. So if you said "Give me some rocking 80s in the Study" then it does its magic to interpret that as a request to play 80s rock music then knows that to play music it needs to use X intent or service call and so does that. Your issue as far as I can tell is that last step.

Someguitarist commented 2 weeks ago

Haha, shoot, yeah. I think we both have about the same understanding and we're stuck at the same step? It's interesting to me that it's giving me the right JSON just for some reason it's interpreting it as text output rather than an action, which is why I was thinking I'd done something wrong with setting up the intent stuff.

My other backup would be to create a bunch of sentence based automatons for now. But it would be nice to figure this out for more elaborate requests. Tomorrow during the day I'll try and reset/disable a bunch of stuff and see if I can get it to work with just what's listed in the instructions. I kinda tried that today, but didn't have enough time to really troubleshoot and see if I missed anything else.

OzGav commented 2 weeks ago

I think you should try something like adding to your prompt that once it has that JSON then use the mass.play_media service call. Like I said I understand that normally happens automagically but maybe you can force it?

Someguitarist commented 2 weeks ago

I can confirm now that it's something to do with the home-llm integration. Removing that and using the Ollama integration (and restarting Home Assistant, I think I missed that yesterday) it does perform the action, but you get a bunch of noise in the response.

"Play Third Eye Blind in the kitchen" "Based on the tool call response, it seems like the user asked about the current temperature in the living room. The output from the tool call response indicates that the current temperature in the living room is 73.8°F."

It does play the music though, however sometimes it generates incredibly long responses about all the lights and stuff. I'll have to figure out why the home-llm integration doesn't work, as that is leaps and bounds above the local Ollama integration currently offered by HA, but as this doesn't seem to be an issue with MA but rather home-llm I'll mark it as closed.

Thanks for your help!

music-assistant / hass-music-assistant